DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/20/2021 have been considered by the examiner and been placed of record in the file.

35 USC § 112(f) (pre-AIA  35 USC 112, 6th) 
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. § 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that § 112(f) (pre-AIA  § 112, sixth paragraph) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function. 
Absence of the word “means” (or “step for”) in a claim creates a rebuttable presumption that the claim element is not to be treated in accordance with 35 U.S.C. § 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that § 112(f) (pre-AIA  § 112, sixth paragraph) is not invoked is rebutted when the claim element recites function but fails to recite sufficiently definite structure, material or acts to perform that function. 
Claim elements in this application that use the word “means” (or “step for”) are presumed to invoke § 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke § 112(f) except as otherwise indicated in an Office action. 
Claim limitations in claims 1-10 have been interpreted under 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph, because they use a generic placeholder “map extraction module”, “box detection module” and “mask generation module” coupled with such functional languages as “to receive”, “to classify” and “to generate” without reciting sufficient structure to achieve the function.  Furthermore, the generic placeholder is not preceded by a structural modifier.  
Since these claim limitations invoke 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph, claims 1-10 are interpreted to cover the corresponding structures described in the specification that achieve the claimed functions, and equivalents thereof.  
A review of the specification shows that there appears to be no corresponding structure described in the specification for the 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre- AIA ), sixth paragraph limitations. 
In fact, in the specification the term “module” is described as: In addition, terms such as "... unit", "... group", and "module" described in the specification mean a unit that processes at least one function or operation, and it can be implemented as hardware or software or a combination of hardware and software (Page 6 lines 7-10). 
Such description is broad and does not identify any specific hardware used to perform the claimed function.
If applicant wishes to provide further explanation or dispute the examiner’s interpretation of the corresponding structure, applicant must identify the corresponding structure with reference to the specification by page and line number, and to the drawing, if any, by reference characters in response to this Office action. 
If applicant does not wish to have the claim limitation treated under 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph, applicant may amend the claim so that it will clearly not invoke 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph, or present a sufficient showing that the claim recites sufficient structure, material, or acts for performing the claimed function to preclude application of 35 U.S.C. 112(f) or 35 U.S.C. 112 (pre-AIA ), sixth paragraph.
For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance with 35 U.S.C. § 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).



Claim Rejections - 35 U$C § 112 

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


 

Claims 1-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. 

Claims 1-10 recite the means plus function limitations that invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structures, materials, or acts for the claimed functions. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-7, 11-12 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (KR20200049451A – English translation).

Claim 1. Kim et al. disclose an object detection system (FIG. 1), comprising: 
a feature map extraction module configured to receive an image for object detection (read as The convolutional network system 100 is configured to process an input image to generate an output image… the output image includes bounding shape surrounding the contour of an object included in the image… [0014])  and extract a feature map (FIG 1, output of item 110) having multiple resolutions for the image (read as generate feature maps having a size of 28×28×512 by performing a deconvolution operation on the aligned ROIs having a size of 14×14×512. [0043]); 
a bounding box detection module configured to classify a bounding box (Kim et al.: read as  A pixel marked with '1' corresponds to a box surrounding the actual boundary of an object with a rectangular box, and a pixel marked with '0' corresponds to an area other than the box [0058]) by applying a first group of convolutional layers to the feature map (read as feature extractor 110 may include a plurality of layers L1 to L4 for repeatedly performing a convolution operation and a pooling operation [0029]), and predict the bounding box by applying a second group of convolutional layers to the feature map (read as feature extractor 110 may include a plurality of layers L1 to L4 for repeatedly performing a convolution operation and a pooling operation [0029]); and 
a mask generation module configured to generate a mask for the shape of the object in the bounding box using the feature map (read as …the fourth kernel K4 to output the fourth feature map FM4. [0032]).
The combined teaching of multiple embodiments were used in the rejection. Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to use the teaching of Kim et al. in order to realize all limitations of the claimed invention. Namely the idea of extracting feature map of input images using multiple resolutions which could lower the processing burden of a system.

Claim 2. The system of claim 1, Kim et al. disclose,
wherein: 
the feature map extraction module constructs a feature pyramid that combines information of feature maps for each of multiple layers from the image (Kim et al.: read as  a plurality of layers L1 to L4 for repeatedly performing a convolution operation and a pooling operation [0029]. FIG. 2-4 and 7, shown different layers applied at different stages of the feature map extraction.)), and extracts the feature map having the multiple resolutions (Kim et al.: read as generate feature maps having a size of 28×28×512 by performing a deconvolution operation on the aligned ROIs having a size of 14×14×512. [0043]) by using the feature pyramid (Kim et al.: read as The feature extractor 110 may be implemented as a feature pyramid network (FPN) [0034]).

Claim 3. The system of claim 2, Kim et al. disclose,
wherein: 
the feature map extraction module extracts the feature maps for each of the multiple layers from a backbone network (Kim et al.: read as That is, various numbers of layers may be used according to the type of network used as the backbone of the feature extractor 110 [0032] …The convolutional neural network system 100 updates the weight and bias values while backpropagating the obtained error δ [0075]), and generates the feature pyramid by adding the extracted feature maps for each of the multiple layers in reverse order (Kim et al.: read as The feature extractor 110 may be implemented as a feature pyramid network (FPN) [0034]).

Claim 4. The system of claim 1, Kim et al. disclose,
wherein: 
the bounding box detection module classifies the bounding box using a binary classifier (Kim et al.: read as  A pixel marked with '1' corresponds to a box surrounding the actual boundary of an object with a rectangular box, and a pixel marked with '0' corresponds to an area other than the box [0058]).

Claim 5. The system of claim 1, Kim et al. disclose,
wherein: 
the bounding box detection module sets offsets in multiple directions based on the center point of the object and then estimates the position and the size of the bounding box (Kim et al.: FIGs 7-17, different box structures shown).

Claim 6. The system of claim 5, Kim et al. disclose,
wherein: 
the bounding box detection module adjusts the reliability of the predicted bounding box based on the confidence score for the classification of the bounding box (Kim et al.: read as the present mask may be referred to as a Scored Bounding Box Mask [0062]) and the centeredness indicating the degree to which the predicted bounding box coincides with the ground truth (GT) (Kim et al.: read as a real mask used for comparison with a prediction mask according to an exemplary embodiment of the present disclosure [0061]).

Claim 7. The system of claim 1, Kim et al. disclose,
wherein: 
the mask generation module extracts an area corresponding to the bounding box from the feature map (Kim et al.: read as generate feature maps having a size of
28×28×512 by performing a deconvolution operation on the aligned ROIs having a
size of 14×14×512. [0043]), and then performs warping with a feature map having a preset resolution (Kim et al.: read as  generate a plurality of prediction masks by
performing a 3×3 convolution operation on feature maps having a size of
28×28×512 [0043]).

Claim 11. Kim et al. disclose an object detection method (FIG. 1-24), comprising: 
receiving an image for object detection (FIG. 1, input image); 
extracting feature maps for each of multiple layers from a backbone network (read as the feature extractor 110 may include a ResNet101, ResNet50, or similar network as a backbone [0017]); 
generating a feature pyramid (read as feature extractor 110 may be implemented as a feature pyramid network (FPN) [0034]) that combines information of feature maps for each of multiple layers (read as feature extractor 110 may include a plurality of layers L1 to L4 for repeatedly performing a convolution operation and a pooling operation [0029]) by adding the extracted feature maps for each of the multiple layers in reverse order (read as convolutional neural network system 100 may train a network constituting the segmentator 160 using back propagation [0026]); 
extracting a feature map having multiple resolutions for the image by using the feature pyramid (read as feature extractor 110 may be implemented as a feature pyramid network (FPN) [0034]); and 
generating a mask for the shape of the object using the feature map (read as …the fourth kernel K4 to output the fourth feature map FM4. [0032]) having multiple resolutions (read as generate feature maps having a size of 28×28×512 by performing a deconvolution operation on the aligned ROIs having a size of 14×14×512. [0043]).
The combined teaching of multiple embodiments were used in the rejection. Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to use the teaching of Kim et al. in order to realize all limitations of the claimed invention. Namely the idea of extracting feature map of input images using multiple resolutions which could lower the processing burden of a system.

Claim 12. The method of claim 11, Kim et al. disclose,
wherein: 
the generating the mask comprises, extracting an area corresponding to the bounding box from the feature map (Kim et al.: read as  A pixel marked with '1' corresponds to a box surrounding the actual boundary of an object with a rectangular box, and a pixel marked with '0' corresponds to an area other than the box [0058]), and performing warping with a feature map having a preset resolution (Kim et al.: read as the segmentator 160 may generate feature maps having a size of 28×28×512 by performing a deconvolution operation on the aligned ROIs having a size of 14×14×512. [0043]).

Claim 16. Kim et al. disclose an object detection method (FIG. 1-24), comprising: 
applying a first group of convolutional layers to a feature map of an image for object detection (read as  a plurality of layers L1 to L4 for repeatedly performing a convolution operation and a pooling operation [0029]. FIG. 2-4 and 7, shown different layers applied at different stages of the feature map extraction.); 
classifying the bounding box using a binary classifier (read as  A pixel marked with '1' corresponds to a box surrounding the actual boundary of an object with a rectangular box, and a pixel marked with '0' corresponds to an area other than the box [0058]); 
applying a second group of convolutional layers to the feature map (read as  a plurality of layers L1 to L4 for repeatedly performing a convolution operation and a pooling operation [0029]. FIG. 2-4 and 7, shown different layers applied at different stages of the feature map extraction.); 
setting offsets in multiple directions based on the center point of the object and estimating the position and the size of the bounding box (FIG. 7-17, sizes of different bounding boxes shown); 
adjusting the reliability (read as a process of reducing errors by adjusting bias and/or weight values of networks constituting the segmentator 160 may be performed [0025] …the segmentator 160 may adjust the size of the aligned ROI in
order to compare the prediction mask with the real mask [0043]) of the predicted bounding box based on the confidence score (read as the classifier 122 may calculate an “objectness score” indicating whether the searched area includes an object [0038]) for the classification of the bounding box and the centeredness indicating the degree to which the predicted bounding box coincides with the ground truth (GT) (read as a real mask used for comparison with a prediction mask according to an exemplary embodiment of the present disclosure [0061]); and 
generating a mask for the shape of the object in the bounding box using the feature map (read as …the fourth kernel K4 to output the fourth feature map FM4. [0032]).
The combined teaching of multiple embodiments were used in the rejection. Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to use the teaching of Kim et al. in order to realize all limitations of the claimed invention to extract feature maps of input images with increased accuracy. 

Claim 17. The method of claim 16, Kim et al. disclose,
wherein: 
the generating the mask comprises, extracting an area corresponding to the bounding box from the feature map (read as the segmentator 160 may generate feature maps having a size of 28×28×512 by performing a deconvolution operation on the aligned ROIs having a size of 14×14×512 [0043]), and performing warping with a feature map having a preset resolution (read as the segmentator 160 may generate feature maps having a size of 28×28×512 by performing a deconvolution operation on the aligned ROIs having a size of 14×14×512 [0043]).

Claims 8-10, 13-15 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (KR20200049451A) in view of Li et al. (SCAttNet: Semantic Segmentation Network With Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images).

Claim 8. The system of claim 7, Kim et al. disclose,
wherein: 
the mask generation module obtains a convolutional feature map by applying a convolutional layer to the warped feature map (Kim et al.: read as  generate a plurality of prediction masks by performing a 3×3 convolution operation on feature maps having a size of 28×28×512 [0043]), and 
Kim et al. do not explicitly disclose:
combines a maximum pooling feature map and an average pooling feature map by performing maximum pooling and average pooling on the convolutional feature map.
However, in the related field of endeavor Li et al. disclose: … average pooling and global max pooling to generate two feature descriptors for each channel. Then, we feed the two feature descriptors into a shared multilayer perceptron… (Section II-C).
FIG. 1 shows combining average and maximum pooling.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of Kim et al. with the teaching of Li et al. in order to propose a new end-to-end semantic segmentation network, which integrates lightweight spatial and channel attention modules that can refine features adaptively (Li et al.: Abstract).
 
Claim 9. The system of claim 8, the combination of Kim et al. and Li et al. teaches,
wherein: 
the mask generation module obtains an attention map by applying a nonlinear function to the combined maximum pooling feature map and average pooling feature map (Li et al.: read as … average pooling and global max pooling to generate two feature descriptors for each channel. Then, we feed the two feature descriptors into a shared multilayer perceptron… (Section II-C)).

Claim 10. The system of claim 9, the combination of Kim et al. and Li et al. teaches,
wherein: 
the mask generation module multiplies the attention map and the convolutional feature map (Li et al. FIG. 1), and performs a binary classification on the multiplied result to generate the mask (Kim et al.: read as  A pixel marked with '1' corresponds to a box surrounding the actual boundary of an object with a rectangular box, and a pixel marked with '0' corresponds to an area other than the box [0058]).

Claim 13. The method of claim 12, Kim et al. disclose,
wherein: 
the generating the mask comprises, obtaining a convolutional feature map by applying a convolutional layer to the warped feature map (read as the segmentator 160 may generate feature maps having a size of 28×28×512 by performing a deconvolution operation on the aligned ROIs having a size of 14×14×512 [0043]); and 
Kim et al. do not explicitly disclose
combining a maximum pooling feature map and an average pooling feature map by performing maximum pooling and average pooling on the convolutional feature map.
However, in the related field of endeavor Li et al. disclose: … average pooling and global max pooling to generate two feature descriptors for each channel. Then, we feed the two feature descriptors into a shared multilayer perceptron… (Section II-C).
FIG. 1 shows combining average and maximum pooling.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of Kim et al. with the teaching of Li et al. in order to propose a new end-to-end semantic segmentation network, which integrates lightweight spatial and channel attention modules that can refine features adaptively (Li et al.: Abstract).

Claim 14. The method of claim 13, the combination of Kim et al. and Li et al. teaches,
wherein: 
the generating the mask comprises, obtaining an attention map by applying a nonlinear function to the combined maximum pooling feature map and average pooling feature map (Li et al.: read as … average pooling and global max pooling to generate two feature descriptors for each channel. Then, we feed the two feature descriptors into a shared multilayer perceptron… (Section II-C)).

Claim 15. The method of claim 14, the combination of Kim et al. and Li et al. teaches,
wherein: 
the generating the mask comprises, multiplying the attention map and the convolutional feature map (Li et al. FIG. 1), and performing a binary classification on the multiplied result to generate the mask (Kim et al.: read as  A pixel marked with '1' corresponds to a box surrounding the actual boundary of an object with a rectangular box, and a pixel marked with '0' corresponds to an area other than the box [0058]).

Claim 18. The method of claim 17, Kim et al. disclose,
wherein: 
the generating the mask comprises, obtaining a convolutional feature map by applying a convolutional layer to the warped feature map (read as the segmentator 160 may generate feature maps having a size of 28×28×512 by performing a deconvolution operation on the aligned ROIs having a size of 14×14×512 [0043]); and 
Kim et al. do not explicitly disclose
combining a maximum pooling feature map and an average pooling feature map by performing maximum pooling and average pooling on the convolutional feature map.
However, in the related field of endeavor Li et al. disclose: … average pooling and global max pooling to generate two feature descriptors for each channel. Then, we feed the two feature descriptors into a shared multilayer perceptron… (Section II-C).
FIG. 1 shows combining average and maximum pooling.
Therefore, it would have been obvious to a person of ordinary skill in the art, at the time the invention was filed, to modify the teaching of Kim et al. with the teaching of Li et al. in order to propose a new end-to-end semantic segmentation network, which integrates lightweight spatial and channel attention modules that can refine features adaptively (Li et al.: Abstract).

Claim 19. The method of claim 18, the combination of Kim et al. and Li et al. teaches,
wherein: 
the generating the mask comprises, obtaining an attention map by applying a nonlinear function to the combined maximum pooling feature map and average pooling feature map (Li et al.: read as … average pooling and global max pooling to generate two feature descriptors for each channel. Then, we feed the two feature descriptors into a shared multilayer perceptron… (Section II-C)).

Claim 20. The method of claim 19, the combination of Kim et al. and Li et al. teaches,
wherein: 
the generating the mask comprises, multiplying the attention map and the convolutional feature map, and performing a binary classification on the multiplied result to generate the mask (Kim et al.: read as  A pixel marked with '1' corresponds to a box surrounding the actual boundary of an object with a rectangular box, and a pixel marked with '0' corresponds to an area other than the box [0058]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Refer to PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMED RACHEDINE whose telephone number is (571)272-9249. The examiner can normally be reached Mon-Fri 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lester Kincaid can be reached on (571)272-7922. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MOHAMMED RACHEDINE
Examiner
Art Unit 2649



/MOHAMMED RACHEDINE/Primary Examiner, Art Unit 2646