DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claims 1, 3-13 and 15-22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Since independent claim 1 has been amended to recite sufficient structure that modifies the functional limitations, the claims are no longer interpreted as invoking 35 USC 112(f).
In view of the amendments to claim 11, the rejection under 35 USC 112 has been withdrawn.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1, 3-13, and 15-22 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled 
Regarding Claim 1, 13, 20 and 21, the newly added feature of “wherein a number of the candidate regions is decided by at least a position and a number of the object detected from the image” doesn’t appear to be supported by the originally filed disclosure. Initially, the candidate regions in the claimed invention are allocated for the purpose of detecting objects in the image (“detecting a position of the object from the image based on…the candidate regions”). That is, the allocation of candidate regions must be made prior to the detection of the objects. How can the number of candidate regions be determined based on detected objects if the object detection is predicated on the candidate regions? The areas of the specification to which Applicants point for allegedly supporting this feature ([0062-0071] of the publication of the subject application) disclose that it is the position distribution of features, rather than detected objects, that determines the number of candidate regions. 
Claims 3-12, 15-19 and 22 are rejected for being dependent on a rejected base claim without curing any of the deficiencies.
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1 and 3-12 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
1 recites the limitation "the one or more processors" in line 3 of claim 1.  There is insufficient antecedent basis for this limitation in the claim.
Claims 3-12 are rejected for being dependent on a rejected base claim without curing any of the deficiencies.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-7, 9, 10, 12, 13, 15-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Fukagai et al. US PG-Pub(US 20200074690 A1) in view of Jeon US PG-Pub(US 20190072977 A1).
detection apparatus comprising: one or more memories storing executable instructions which, when executed by the one or more processors, cause the detection apparatus to perform operations including¶[0069] The CPU 101 is a processor configured to execute program instructions. The CPU 101 reads out at least part of programs and data stored in the HDD 103, loads them into the RAM 102, and executes the loaded programs.):  features from an image(¶[0087] The CNN layer 33 generates the feature map 34 from the input image 32 using a CNN with weights preliminarily obtained through training. The CNN layer 33 has a deep neural network including thirteen convolutional layers, four pooling layers, and three fully connected layers. Each convolutional layer extracts information on features, such as edges, using predetermined filters; determining candidate regions of an object in the image based on the extracted features([0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The higher the score, the higher the probability of the desired object being present. The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets that the neural network of the prior art is using the features extracted in the features map to determine the number of rectangular shaped regions and to place those rectangular shaped regions at the location of the object. The extracted features of feature map are used to calculate a score for each object candidate ; detecting a position of the object from the image based on at least the extracted features and the candidate regions(¶[0101] The object predicting unit 136 corresponds to the aforementioned fast R-CNN layer 37. Based on the score-associated location data output from the object candidate region generating unit 135 and the feature map output from the feature map generating unit 134, the object predicting unit 136 determines each rectangular region with an object and the class of the object, and calculates a score indicating the credibility measure of the determination. The examiner interprets that based on the score from each rectangular shaped region used to detect the location of the object as shown in Fig. 5, that the prior art is able to calculate and determine the type of object present and where it is located in the image.),and wherein, in the image, more candidate regions are allocated to a region where feature distribution of the object is denser(¶[0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32. The examiner interprets that the RPN layer will generate a large number of rectangular regions in areas where object features are more densely located in the image relative to background portions, as seen in Fig. 5 in the top image.); the feature distribution being obtained based on the extracted features. (¶[0088], calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The examiner interprets that the feature distribution is based on the score of the probability of the object being present in the candidate region and the score is being calculated by the extracted features generated by the feature map.)
 wherein a number of the candidate regions is decided by at least a position and a number of the object detected from the image
Jeon teaches wherein a number of the candidate regions is decided by at least a position and a number of the object detected from the image (¶[0023], a processor configured to extract, in parallel with a generation of the feature map, a region of interest (ROI) corresponding to an object of interest from the input image, and to determine, based on a size of the ROI, a number of object candidate regions used to detect the object of interest, wherein the neural network is further configured to recognize the object of interest from the ROI based on the number of object candidate regions. [0048], the faster R-CNN 110 includes a classifier 119 configured to estimate an object class and a background, and a bounding box regressor (not shown) configured to output a position of each object class. The classifier 119 is, for example, a softmax classifier. In an example, the ROI pooling layer 117 and the classifier 119 correspond to a detection network configured to recognize an object. The classifier 119 and the bounding box regressor are connected to a rear end of the FC layer. The examiner interprets that the prior art is using the size of the ROI to determine how many object candidate regions are in the image and as seen in ¶[0048], the region of interest corresponds to the number of objects and the position in the image.) 
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Fukagai with Jeon in order to determine the number of regions based on the position and number of objects in the image. One skilled in the art would have been motivated to modify Fukagai in this manner in order to improve the object recognition speed. (Jeon, ¶[0052])
Regarding Claim 3, the combination of Fukagai and Jeon teaches the detection apparatus according to Claim 1, where Fukagai further teaches wherein the number of the candidate regions is determined according to position distribution of the candidate regions; (¶[0088], The RPN layer 35 . The examiner interprets that based on the location of the detected object then a rectangular shaped region is placed to detect the object as seen in figure 5. ¶[0088] The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets that based on the position of the regions if they overlap then the RPN layer will remove and output a final total of 300 candidate regions used to detect the object.); wherein there are the same number of candidate regions at each position in the position distribution of the candidate regions (¶[0088] The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets that once an object is detected in the image it will have 300 object candidate regions as shown in Fig 5. The top image shows the detected object with the associated regions of the image.).
Regarding Claim 4, the combination of Fukagai and Jeon teaches the detection apparatus according to Claim 3, where Fukagai further teaches wherein the position distribution of the candidate regions is obtained by comparing feature values in the feature distribution or normalized values of the feature values with a predefined threshold value(¶[0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being 
Regarding Claim 5, the combination of Fukagai and Jeon teaches the detection apparatus according to Claim 1, where Fukagai further teaches wherein the number of the candidate regions is determined according to shape distribution of the candidate regions(¶[0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The examiner interprets that the shape distribution of the candidate region is dependent on the extracted features of the object. As seen in Fig. 5 of the prior art there are different sizes of rectangular shaped boxes around each object detected and the RPN layer will determine the number of candidate regions to place on the object for detection.); wherein the shape distribution of the candidate regions is composed of the number of the candidate regions which can be present at a position corresponding to each of feature values in the feature distribution or normalized values of the feature values. (¶[0088], The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate 
Regarding Claim 6, the combination of Fukagai and Jeon teaches the detection apparatus according to Claim 3, where Fukagai further teaches wherein the number of the candidate regions is smaller than or equal to a predefined value (¶[0088], the RPN layer 35 selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets the predefined value to be the top 6000 candidate regions and the determined number to be the final output of candidate regions which is 300); wherein the position distribution of the candidate regions is obtained based on the predefined value. (As seen in figure 5, the position distribution of the candidate region is based on the predefined value since the predefined value was the top 6000 object candidate regions when it came to score. The neural network is using that predefined value of regions in order reduce the overlap between regions such that it can have a final determined amount of candidate object regions which is 300. ).
Regarding Claim 7, the combination of Fukagai and Jeon teaches the detection apparatus according to Claim 5, where Fukagai further teaches wherein the number of the candidate regions is smaller than or equal to a predefined value(¶[0088], the RPN layer 35 selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner ; wherein the shape distribution of the candidate regions is obtained based on the predefined value(As seen in figure 5, bottom image, the shape distribution of the candidate region is based on the predefined value since the predefined value was the top 6000 object candidate regions when it came to score. The neural network is using that predefined value of regions in order reduce the overlap between regions such that the shape of the region covers the object in order to show the recognition result).
Regarding Claim 9, Fukagai and Jeon teaches the detection apparatus according to Claim 1, where Fukagai further teaches wherein the position of the object in the image is extracted by executing a regression operation based on the extracted features and the determined number, position and shape of the candidate regions. (¶[0089] The fast R-CNN layer 37 determines, based on the feature map 34 and the object candidate regions 36, the class of an object captured in each object candidate region and calculates a score indicating the credibility measure of the determination result. The fast R-CNN layer 37 selects a small number of image regions with sufficiently high scores. The recognition results 38 output from the fast R-CNN layer 37 include, for each selected image region, the location of the image region, the determined object class, and the score indicating the probability that the object is the particular class. The examiner interprets that neural network is generating a probability or correlation score based on the extracted features of the object in the rectangular shaped region as shown in figure 5 to be the regression operation).
Regarding Claim 10, the combination of Fukagai and Jeon teaches the detection apparatus according to Claim 9, wherein ifferent predefined threshold values for the portions in the imaqe, which are respectively corresponding to different density distribution values in density distribution, are set by using the density distribution of the object in the image(¶[0221] A score-associated location data 151 is a collection of elements indicating object candidate regions, which are sorted in descending ∪B), and indicates the ratio of the overlapping area between the object candidate regions A and B to the sum of the areas of the two regions. IoU is closer to 1 when there is a higher degree of region overlap while IoU is closer to 0 when there is a lower degree of region overlap. When IoU of the object candidate regions A and B exceeds a threshold (e.g. 0.7), the selecting unit 139 deletes, of the object candidate regions A and B, one with a lower score.); a final object detection result is obtained based on an object detection result obtained by the regression operation and the predefined threshold values. (¶[0091] FIG. 5 illustrates an example of the recognition results. [0092] An image 41 of FIG. 5 is a displayed image with bounding boxes indicating object candidate regions output from the RPN layer 35, superimposed over an input image. The RPN layer 35 outputs 300 object candidate regions for a single input image; however, in the example of FIG. 5, only a small number of object candidate regions are presented to facilitate understanding. The image 41 captures, as objects to be detected, a car, a dog, a horse, and two people.)
Regarding Claim 12, the combination of Fukagai and Jeon teaches the detection apparatus according to Claim 1, where Fukagai further teaches wherein the distribution is obtained by using a pre-generated neural network.
Regarding Claim 13, Fukagai teaches a detecting method comprising:  features from an image(¶[0087] The CNN layer 33 generates the feature map 34 from the input image 32 using a CNN with weights preliminarily obtained through training. The CNN layer 33 has a deep neural network including thirteen convolutional layers, four pooling layers, and three fully connected layers. Each convolutional layer extracts information on features, such as edges, using predetermined filters; determining candidate regions of an object in the image based on the extracted features([0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The higher the score, the higher the probability of the desired object being present. The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets that the neural network of the prior art is using the features extracted in the features map to determine the number of rectangular shaped regions and to place those rectangular shaped regions at the location of the object. The extracted features of feature map are used to calculate a score for each object candidate region and the RPN layer removes overlapping regions to generate a determined number of regions to be 300 object candidate regions.); detecting a position of the object from the image based on at least the extracted features and the candidate regions(¶[0101] The object predicting unit 136 corresponds to the aforementioned fast R-CNN layer 37. Based on the score-associated location data output from the object candidate region ,and wherein, in the image, more candidate regions are allocated to a region where feature distribution of the object is denser(¶[0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32. The examiner interprets that the RPN layer will generate a large number of rectangular regions in areas where object features are more densely located in the image relative to background portions, as seen in Fig. 5 in the top image.); the feature distribution being obtained based on the extracted features. (¶[0088], calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The examiner interprets that the feature distribution is based on the score of the probability of the object being present in the candidate region and the score is being calculated by the extracted features generated by the feature map.)
Fukagai does not explicitly teach wherein a number of the candidate regions is decided by at least a position and a number of the object detected from the image
Jeon teaches wherein a number of the candidate regions is decided by at least a position and a number of the object detected from the image (¶[0023], a processor configured to extract, in parallel with a generation of the feature map, a region of interest (ROI) corresponding to an object of interest 
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Fukagai with Jeon in order to determine the number of regions based on the position and number of objects in the image. One skilled in the art would have been motivated to modify Fukagai in this manner in order to improve the object recognition speed. (Jeon, ¶[0052])
Regarding Claim 15, the combination Fukagai and Jeon teaches the detecting method according to Claim 13, where Fukagai further teaches wherein the number of the candidate regions is determined according to position distribution of the candidate regions; (¶[0088], The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32. The examiner interprets that based on the location of the detected object then a rectangular shaped region is placed to detect the object as seen in figure 5. ¶[0088] The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate ); wherein there are the same number of candidate regions at each position 20in the position distribution of the candidate regions (¶[0088] The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets that once an object is detected in the image it will have 300 object candidate regions as shown in Fig 5. The top image shows the detected object with the associated regions of the image.).
Regarding Claim 16, the combination of Fukagai and Jeon teaches the detecting method according to claim 13, where Fukagai further teaches wherein the number of the candidate regions is determined according to shape distribution of the candidate regions(¶[0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The examiner interprets that the shape distribution of the candidate region is dependent on the extracted features of the object. As seen in Fig. 5 of the prior art there are different sizes of rectangular shaped boxes around each object detected and the RPN layer will determine the number of candidate regions to place on the object for detection.);wherein the shape distribution of the candidate regions is composed of the number of the candidate regions which can be present at a position corresponding to each of feature values in the feature distribution or normalized values of the feature values(¶[0088], The RPN layer 35 sets, as object . The examiner interprets that the shape of the candidate region is determined by comparing the score the feature distribution of each region if it contains the object. The RPN layer removes overlapping rectangular shaped regions in order for the object to be fully recognized as seen in figure 5.).
Regarding Claim 17, the combination of Fukagai and Jeon teaches the detecting method according to Claim 16, where Fukagai further teaches wherein the number of the candidate regions is smaller than or equal to a predefined value(¶[0088], the RPN layer 35 selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets the predefined value to be the top 6000 candidate regions and the determined number to be the final output of candidate regions which is 300); wherein the shape distribution of the candidate regions is obtained based on the predefined value (As seen in figure 5, bottom image, the shape distribution of the candidate region is based on the predefined value since the predefined value was the top 6000 object candidate regions when it came to score. The neural network is using that predefined value of regions in order reduce the overlap between regions such that the shape of the region covers the object in order to show the recognition result).
Regarding Claim 18, the combination of Fukagai and Jeon teaches the detecting method according to Claim 13, where Fukagai further teaches wherein the position of the object is extracted from the image by executing a regression operation based on the extracted features and the determined number, position and shape of the candidate regions. (¶[0089] The fast R-CNN layer 37 determines, based on the feature map 34 and the object candidate regions 36, the class of an object captured in each object candidate region and calculates a score indicating the credibility measure of the determination result. The fast R-CNN layer 37 selects a small number of image regions with sufficiently high scores. The recognition results 38 output from the fast R-CNN layer 37 include, for each selected image region, the location of the image region, the determined object class, and the score indicating the probability that the object is the particular class. The examiner interprets that neural network is generating a probability or correlation score based on the extracted features of the object in the rectangular shaped region as shown in figure 5 to be the regression operation).
Regarding Claim 19, the combination of Fukagai and Jeon teaches the detecting method according to claim 13, where Fukagai further teaches wherein the feature distribution is obtained by using a pre-generated neural network. (¶[0066], An image recognition apparatus 100 according to the second embodiment uses a neural network to implement image recognition for determining the location and class of each object in an input image. The examiner interprets that all the image processing being performed in the prior art is done by a neural network.).
Regarding Claim 20, Fukagai teaches an image processing apparatus comprising: an acquisition device which acquires an image or video(¶[0097], The input image may be captured by an image pickup device coupled to the image recognition apparatus 100.); a storage device which stores an instruction([0069], The RAM 102 is volatile semiconductor memory for temporarily storing therein programs to be executed by the CPU 101 and data to be used by the CPU 101 for its computation); and a processor which executes the instruction based on the acquired image or video, such that the processor at least implements the following steps(¶[0069] The CPU 101 is a processor configured to execute program instructions. The CPU 101 reads out at least part of programs and data stored in the HDD 103, loads them into the RAM 102, and executes the loaded programs.): ¶[0087] The CNN layer 33 generates the feature map 34 from the input image 32 using a CNN with weights preliminarily obtained through training. The CNN layer 33 has a deep neural network including thirteen convolutional layers, four pooling layers, and three fully connected layers. Each convolutional layer extracts information on features, such as edges, using predetermined filters; determine candidate regions of an object in the image based on the extracted features ([0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The higher the score, the higher the probability of the desired object being present. The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets that the neural network of the prior art is using the features extracted in the features map to determine the number of rectangular shaped regions and to place those rectangular shaped regions at the location of the object. The extracted features of feature map are used to calculate a score for each object candidate region and the RPN layer removes overlapping regions to generate a determined number of regions to be 300 object candidate regions.); detecting a position of the object from the image based on at least the extracted features and the candidate regions(¶[0101] The object predicting unit 136 corresponds to the aforementioned fast R-CNN layer 37. Based on the score-associated location data output from the object candidate region generating unit 135 and the feature map output from the feature map generating unit 134, the object predicting unit 136 determines each rectangular region with an object and the class of the object, and calculates a score indicating the credibility measure of the determination. The examiner interprets that based on the score from each rectangular shaped region used to detect the location of the object as shown in Fig. 5, that the prior art is able to calculate and determine the type of object present and where it is located in the image.),and wherein, in the image, more candidate regions are allocated to a region where feature distribution of the object is denser(¶[0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32. The examiner interprets that the RPN layer will generate a large number of rectangular regions in areas where object features are more densely located in the image relative to background portions, as seen in Fig. 5 in the top image.); the feature distribution being obtained based on the extracted features. (¶[0088], calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The examiner interprets that the feature distribution is based on the score of the probability of the object being present in the candidate region and the score is being calculated by the extracted features generated by the feature map.)
Fukagai does not explicitly teach wherein a number of the candidate regions is decided by at least a position and a number of the object detected from the image
 wherein a number of the candidate regions is decided by at least a position and a number of the object detected from the image (¶[0023], a processor configured to extract, in parallel with a generation of the feature map, a region of interest (ROI) corresponding to an object of interest from the input image, and to determine, based on a size of the ROI, a number of object candidate regions used to detect the object of interest, wherein the neural network is further configured to recognize the object of interest from the ROI based on the number of object candidate regions. [0048], the faster R-CNN 110 includes a classifier 119 configured to estimate an object class and a background, and a bounding box regressor (not shown) configured to output a position of each object class. The classifier 119 is, for example, a softmax classifier. In an example, the ROI pooling layer 117 and the classifier 119 correspond to a detection network configured to recognize an object. The classifier 119 and the bounding box regressor are connected to a rear end of the FC layer. The examiner interprets that the prior art is using the size of the ROI to determine how many object candidate regions are in the image and as seen in ¶[0048], the region of interest corresponds to the number of objects and the position in the image.) 
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Fukagai with Jeon in order to determine the number of regions based on the position and number of objects in the image. One skilled in the art would have been motivated to modify Fukagai in this manner in order to improve the object recognition speed. (Jeon, ¶[0052])
Regarding Claim 21, Fukagai teaches an image processing system comprising: an acquisition apparatus which acquires an image or video(¶[0097], The input image may be captured by an image pickup device coupled to the image recognition apparatus 100.);a detection apparatus for detecting an object from the acquired image or 5video by extracting features from the image(¶[0087] The CNN layer 33 generates the feature map 34 from the input image 32 using . As seen in Fig.2 of the prior art shows a CPU 101 and in ¶[0069] The CPU 101 is a processor configured to execute program instructions. The CPU 101 reads out at least part of programs and data stored in the HDD 103, loads them into the RAM 102, and executes the loaded programs. The examiner interprets the CPU is storing the neural network and the CPU is capable of extracting features from the input image.); determine candidate regions of an object in the image based on the extracted features ([0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The higher the score, the higher the probability of the desired object being present. The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets that the neural network of the prior art is using the features extracted in the features map to determine the number of rectangular shaped regions and to place those rectangular shaped regions at the location of the object. The extracted features of feature map are used to calculate a score for each object candidate region and the RPN layer removes overlapping regions to generate a determined number of regions to be 300 object candidate regions.); detecting a position of the object from the image based on at least the extracted features and the candidate regions(¶[0101] The object predicting unit 136 corresponds to the aforementioned fast R-CNN layer 37. Based on the score-associated location data output from the object candidate region generating unit 135 and the feature map output from the feature map generating unit 134, the object predicting unit 136 determines each rectangular region with an object and the class of the object, and calculates a score indicating the credibility measure of the determination. The examiner interprets that based on the score from each rectangular shaped region used to detect the location of the object as shown in Fig. 5, that the prior art is able to calculate and determine the type of object present and where it is located in the image.),
and a processing apparatus which executes a subsequent image processing operation based on the detected object(¶[0071] The image signal processing unit 105 produces video images in accordance with drawing commands from the CPU 101 and displays them on a screen of a display 111 coupled to the image recognition apparatus 100. The examiner interprets that the image signal processing unit is processing of the image by taking the video images from the CPU and displaying them on to a display showing the recognition results of the image recognition.), wherein the acquisition apparatus, the detection apparatus and the processing apparatus are connected each other via a network (¶[0097], The input image may be captured by an image pickup device coupled to the image recognition apparatus 100. Alternatively, the input image may be input to the image recognition apparatus 100 by the user, or sent to the image recognition apparatus 100 from a different information processor via the network 114. The examiner interprets that the image pick up device to be the acquisition apparatus, the image recognition apparatus to be the detection apparatus and the different information processor to be the processing apparatus which are all connected via network. ).
and wherein, in the image, more candidate regions are allocated to a region where feature distribution of the object is denser(¶[0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to  the feature distribution being obtained based on the extracted features. (¶[0088], calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The examiner interprets that the feature distribution is based on the score of the probability of the object being present in the candidate region and the score is being calculated by the extracted features generated by the feature map.)
Fukagai does not explicitly teach wherein a number of the candidate regions is decided by at least a position and a number of the object detected from the image
Jeon teaches wherein a number of the candidate regions is decided by at least a position and a number of the object detected from the image (¶[0023], a processor configured to extract, in parallel with a generation of the feature map, a region of interest (ROI) corresponding to an object of interest from the input image, and to determine, based on a size of the ROI, a number of object candidate regions used to detect the object of interest, wherein the neural network is further configured to recognize the object of interest from the ROI based on the number of object candidate regions. [0048], the faster R-CNN 110 includes a classifier 119 configured to estimate an object class and a background, and a bounding box regressor (not shown) configured to output a position of each object class. The classifier 119 is, for example, a softmax classifier. In an example, the ROI pooling layer 117 and the classifier 119 correspond to a detection network configured to recognize an object. The classifier 119 and the bounding box regressor are connected to a rear end of the FC layer. The examiner interprets that the prior art is using the size of the ROI to determine how many object candidate regions are in the 
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Fukagai with Jeon in order to determine the number of regions based on the position and number of objects in the image. One skilled in the art would have been motivated to modify Fukagai in this manner in order to improve the object recognition speed. (Jeon, ¶[0052])
Regarding Claim 22, the combination of Fukagai and Jeon teaches a non-transitory computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform the detecting method according to claim 13 (¶[0069] The CPU 101 is a processor configured to execute program instructions. The CPU 101 reads out at least part of programs and data stored in the HDD 103, loads them into the RAM 102, and executes the loaded programs. The RAM 102 is volatile semiconductor memory for temporarily storing therein programs to be executed by the CPU 101 and data to be used by the CPU 101 for its computation).
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Fukagai et al. US PG-Pub(US 20200074690 A1) in view of Jeon US PG-Pub(US 20190072977 A1) in view of El-Khamy et al. US PG-Pub (US 20180089505 A1).
Regarding Claim 8, while Fukagai and Jeon teaches the detection apparatus according to claim 1, they do not explicitly teach wherein for objects having different scales in the image, the features are extracted with different levels respectively from the image; wherein for each feature among the extracted features with different levels, the determination unit and the detection unit execute corresponding operations.
El-Khamy teaches wherein for objects having different scales in the image, the features are extracted with different levels respectively from the image ([0045], a primary object detector includes ); 25wherein for each feature among the extracted features with different levels, the determination unit and the detection unit execute corresponding operations ([0045], Bounding box candidates of different sizes and aspect ratios at each location of the extracted features are further classified as an object or background in the captured image, and localization offsets of the candidate bounding boxes are calculated by bounding box regressions. The examiner interprets after the features are extracted the prior art using a bounding box around the extract features to determine if there is an object present which is similar functionality of what the determination unit and detection unit is doing in the instant application).
It would have been obvious at the time of filing to one of ordinary skill in the art to add the teaching of El-Khamy to Fukagai and Jeon in order to extract features at different scales of the image. One skilled in the art would have been motivated to modify Fukagai and Jeon in this manner in order to perform fast and robust object detection. (El-Khamy, ¶[0002])
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Fukagai et al. US PG-Pub(US 20200074690 A1) in view of Jeon US PG-Pub(US 20190072977 A1) in view of Yu et al. US PG-Pub(US 20140028442 A1).
Regarding Claim 11, while Fukagai and Jeon teaches the detection apparatus according to Claim 1, they don’t explicitly teach wherein the image is divided based on density distribution of the objects in the image, into: a first portion for outputting the determined position and shape of the candidate regions, so as to output a position and shape of the candidate regions determined at a position corresponding to the first portion;  and a second portion from which corresponding objects are detected, so as to detect corresponding objects from the image corresponding to the second portion.
Yu teaches wherein the image is divided based on density distribution of the objects in the image(As seen in Figure 4, the image is divided up and regions 1, 5, 6, and 7 show the density distributions of the object), into: a first portion for outputting the determined position and shape of the candidate regions, so as to output a position and shape of the candidate regions determined at a position corresponding to the first portion (¶[0071] The size of the segmented detection region and the number of segmented detection regions are determined as required. In addition, in any of the segmented detection regions, any number of binary sensors can be deployed. This embodiment does not set limitations to the size of the segmented detection region, the number of segmented detection regions, and the locations and number of deployed binary sensors. The examiner interprets that the regions can be shaped like a box as seen in figure 4 and they are based on the location of the object.) and a second portion from which corresponding objects are detected, so as to detect corresponding objects from the image corresponding to the second portion. (¶[0071], Using the detected objects as shown in FIG. 4 as an example, when the current time is t, the detected objects may appear in detection regions 1, 5, 6, and 7, the detection regions where the detected object may appear are identified by 1, and other detection regions are identified by 0. In this case, with regard to the location of the detected objects.)
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Fukagai and Jeon with Yu in order to divide the image into portions based on the density distribution of the object. One skilled in the art would have been motivated to modify Fukagai and Jeon in this manner in order to improve accuracy of the detection result. (Yu, ¶[0006])
Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAN D HOANG whose telephone number is (571)272-4344. The examiner can normally be reached Monday-Friday 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire X. Wang can be reached on (571) 270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and 





/HAN HOANG/Examiner, Art Unit 2663                                                                                                                                                                                                        
/SEAN M CONNER/Primary Examiner, Art Unit 2663