DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claims 1, 3-13 and 15-22 have been considered but are moot because the new ground of rejection set forth below.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 1, 3-7, 9-13, 15-22 are rejected under 35 U.S.C. 103 as being unpatentable over Fukagai et al. US PG-Pub(US 20200074690 A1) in view of Sakai US PG-Pub(US 20210357708 A1.
Regarding Claim 1, Fukagai teaches a detection apparatus comprising: one or more processors, and one or more memories storing executable instructions which, when executed by the one or more processors, cause the detection apparatus to perform operations including¶[0069] The CPU 101 is a processor configured to execute program instructions. The CPU 101 reads out at least part of programs and data stored in the HDD 103, loads them into the RAM 102, and executes the loaded programs.):extracting features from an image(¶[0087] The CNN layer 33 generates the feature map 34 from the input image 32 using a CNN with weights preliminarily obtained through training. The CNN layer 33 has a deep neural network including thirteen convolutional layers, four pooling layers, and three fully connected layers. Each convolutional layer extracts information on features, such as edges, using predetermined filters); determining candidate regions of an object in the image based on the extracted features([0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The higher the score, the higher the probability of the desired object being present. The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets that the neural network of the prior art is using the features extracted in the features map to determine the number of rectangular shaped regions and to place those rectangular shaped regions at the location of the object. The extracted features of feature map are used to calculate a score for each object candidate region and the RPN layer removes overlapping regions to generate a determined number of regions to be 300 object candidate regions.); and detecting a position of the object from the image based on at least the extracted features and the candidate regions(¶[0101] The object predicting unit 136 corresponds to the aforementioned fast R-CNN layer 37. Based on the score-associated location data output from the object candidate region generating unit 135 and the feature map output from the feature map generating unit 134, the object predicting unit 136 determines each rectangular region with an object and the class of the object, and calculates a score indicating the credibility measure of the determination. The examiner interprets that based on the score from each rectangular shaped region used to detect the location of the object as shown in Fig. 5, that the prior art is able to calculate and determine the type of object present and where it is located in the image.), 
Fukagai does not explicitly teach wherein a number of the candidate regions corresponding to a portion of the image is determined as N in a case that a feature corresponding to the portion of the image is equal or greater than a first threshold value, and as M(N>M) in a case the feature corresponding to the portion of the image is smaller than the first threshold value.
Sakai teaches wherein a number of the candidate regions corresponding to a portion of the image is determined as N in a case that a feature corresponding to the portion of the image is equal or greater than a first threshold value, ([0018], Here, the region integrating unit may determine candidate regions having top predetermined number of greatest weighted average scores (including average scores) to be related regions. Alternatively, the region integrating unit may determine all the candidate regions having average scores greater than or equal to a threshold to be related regions without limiting the number of candidate regions. ¶[0019] In a case where a target object is included in an image, the first detection unit may determine that a plurality of candidate regions is detected near the target object. By determining related regions as described above, a plurality of candidate regions detected for one target object can be determined to be related regions. The examiner interprets the prior art is determining the number of candidate regions based on the scores of the potential object being in the candidate region is greater than or equal to a threshold.) and as M(N>M) in a case the feature corresponding to the portion of the image is smaller than the first threshold value. (¶[0069], Then, the candidate regions 411 and 413 close to the representative region 412 have a relation score greater than or equal to a threshold. However the relation scores of the other regions are lower than the threshold. Therefore, it is judged that the candidate regions 411 and 413 are related regions of the representative region 412, and the candidate regions 411 to 413 form one group 421 as illustrated in FIG. 7B. Then, according to the candidate regions 411 to 413, one integrated region is determined. The examiner interprets that the number of candidate regions for a case where N>M is determined by if the related scores are lower than the threshold the regions form a group until one region is determined.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Fukagai with Sakai in order to determine the number of candidate regions based on using a threshold criteria. One skilled in the art would have been motivated to modify Fukagai in this manner in order to detect the predetermined object faster and more accurately. (Sakai, Abstract)
Regarding Claim 3, the combination of Fukagai and Sakai teach the detection apparatus according to Claim 1, wherein the candidate regions contain a plurality of shapes, and the portion -2-Amendment for Application No.: 16/693141Attorney Docket: 1900-25459-CINCof the image is allocated the plurality of shapes per the candidate region; (Fukagai,¶[0088], Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The examiner interprets that the object candidate regions are differently sized rectangular shaped regions and they are allocated based on regions that contain the target object.)
Regarding Claim 4, the combination of Fukagai and Sakai teaches the detection apparatus according to Claim 3, wherein the extracted features indicate feature values extracted by a feature extractinq alqorithm or normalized values by normalizing feature values of the extracted features. (Fukagai, ¶[0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The higher the score, the higher the probability of the desired object being present. The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets that the position of the regions is being determined by comparing the probability of the object being in the region by a score and in ¶[0221] it is discussed on how they remove a region based on if it exceeds a threshold.).
Regarding Claim 5, the combination of Fukagai and Sakai teaches the detection apparatus according to Claim 3, wherein a number of shapes contained in the candidate region for the portion of the image is determined according to the feature corresponding to the portion of the image. (Fukagai, ¶[0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The examiner interprets that the shape distribution of the candidate region is dependent on the extracted features of the object. As seen in Fig. 5 of the prior art there are different sizes of rectangular shaped boxes around each object detected and the RPN layer will determine the number of candidate regions to place on the object for detection.);
Regarding Claim 6, the combination of Fukagai and Sakai teaches the detection apparatus according to Claim 1, wherein a total number of the candidate regions allocated to the image is a predefined value(Fukagai, ¶[0088], the RPN layer 35 selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets the predefined value to be the top 6000 candidate regions and the determined number to be the final output of candidate regions which is 300), and the number of the candidate regions corresponding to the portion of the image is determined further based on the predefined value. (Fukagai, As seen in figure 5, the position distribution of the candidate region is based on the predefined value since the predefined value was the top 6000 object candidate regions when it came to score. The neural network is using that predefined value of regions in order reduce the overlap between regions such that it can have a final determined amount of candidate object regions which is 300. ).
Regarding Claim 7, the combination of Fukagai and Sakai teaches the detection apparatus according to Claim 5, wherein the number of shapes contained in the candidate region(Sakai, ¶[0051] Note that even though the rectangular region is cut out in the embodiment, the region to be cut out may be a region having any shape other than a rectangle). for the portion of the image is determined as T1 in a case that the feature corresponding to the portion of the image is smaller than a second threshold value(Sakai, ¶[0069],However the relation scores of the other regions are lower than the threshold. Therefore, it is judged that the candidate regions 411 and 413 are related regions of the representative region 412, and the candidate regions 411 to 413 form one group 421 as illustrated in FIG. 7B. Then, according to the candidate regions 411 to 413, one integrated region is determined.), as T2(T1 <T2) in a case that the feature corresponding to the portion of the image is equal or greater than the second threshold value. ([0018], Here, the region integrating unit may determine candidate regions having top predetermined number of greatest weighted average scores (including average scores) to be related regions. Alternatively, the region integrating unit may determine all the candidate regions having average scores greater than or equal to a threshold to be related regions without limiting the number of candidate regions. ¶[0019] In a case where a target object is included in an image, the first detection unit may determine that a plurality of candidate regions is detected near the target object. By determining related regions as described above, a plurality of candidate regions detected for one target object can be determined to be related regions. The examiner interprets the prior art is determining the number of candidate regions based on the scores of the potential object being in the candidate region is greater than or equal to a threshold.)  
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Fukagai with Sakai in order to determine the number of candidate regions based on using a threshold criteria. One skilled in the art would have been motivated to modify Fukagai in this manner in order to detect the predetermined object faster and more accurately. (Sakai, Abstract)
Regarding Claim 9, the combination of Fukagai and Sakai teaches the detection apparatus according to Claim 1, wherein the position of the object in the image is extracted by executing a regression operation based on the extracted features and the determined number, position and shape of the candidate regions. (Fukagai, ¶[0089] The fast R-CNN layer 37 determines, based on the feature map 34 and the object candidate regions 36, the class of an object captured in each object candidate region and calculates a score indicating the credibility measure of the determination result. The fast R-CNN layer 37 selects a small number of image regions with sufficiently high scores. The recognition results 38 output from the fast R-CNN layer 37 include, for each selected image region, the location of the image region, the determined object class, and the score indicating the probability that the object is the particular class. The examiner interprets that neural network is generating a probability or correlation score based on the extracted features of the object in the rectangular shaped region as shown in figure 5 to be the regression operation).
Regarding Claim 10, the combination of Fukagai and Sakai teaches the detection apparatus according to Claim 1, wherein the position of the object is detected in a case that a category confidence corresponding to the candidate region is greater than or equal to a third threshold value(¶[0221] A score-associated location data 151 is a collection of elements indicating object candidate regions, which are sorted in descending order of the scores. The selecting unit 139 of the object candidate region generating unit 135 deletes part of elements from the score-associated location data 151 by the NMS operation. Elements to be deleted are those corresponding to, amongst object candidate regions with low scores, those each having a high proportion of overlap with an object candidate region with a high score. Intersection over Union (IoU) may be used as an index of the region overlap rate. IoU between object candidate regions A and B is defined as: IoU=(A∩B)/(A∪B), and indicates the ratio of the overlapping area between the object candidate regions A and B to the sum of the areas of the two regions. IoU is closer to 1 when there is a higher degree of region overlap while IoU is closer to 0 when there is a lower degree of region overlap. When IoU of the object candidate regions A and B exceeds a threshold (e.g. 0.7), the selecting unit 139 deletes, of the object candidate regions A and B, one with a lower score.);, and the third threshold value is determined based on a distribution of the extracted features. (¶[0091] FIG. 5 illustrates an example of the recognition results. [0092] An image 41 of FIG. 5 is a displayed image with bounding boxes indicating object candidate regions output from the RPN layer 35, superimposed over an input image. The RPN layer 35 outputs 300 object candidate regions for a single input image; however, in the example of FIG. 5, only a small number of object candidate regions are presented to facilitate understanding. The image 41 captures, as objects to be detected, a car, a dog, a horse, and two people.)
Regarding Claim 11, the combination of Fukagai and Sakai teaches the detection apparatus according to Claim 1, wherein the position of the object is output by updating based on a shape of the candidate region corresponding to a first portion in a case that a feature corresponding to the first portion is smaller than a fourth threshold value(Sakai, ¶[0069],However the relation scores of the other regions are lower than the threshold. Therefore, it is judged that the candidate regions 411 and 413 are related regions of the representative region 412, and the candidate regions 411 to 413 form one group 421 as illustrated in FIG. 7B. Then, according to the candidate regions 411 to 413, one integrated region is determined.),and the position of the object is output as a position as the candidate region corresponding to a second portion in a case that a feature corresponding to the second portion is greater than or equal to the fourth threshold value. ([0018], Here, the region integrating unit may determine candidate regions having top predetermined number of greatest weighted average scores (including average scores) to be related regions. Alternatively, the region integrating unit may determine all the candidate regions having average scores greater than or equal to a threshold to be related regions without limiting the number of candidate regions. ¶[0019] In a case where a target object is included in an image, the first detection unit may determine that a plurality of candidate regions is detected near the target object. By determining related regions as described above, a plurality of candidate regions detected for one target object can be determined to be related regions. The examiner interprets the prior art is determining the number of candidate regions based on the scores of the potential object being in the candidate region is greater than or equal to a threshold.)  
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Fukagai with Sakai in order to determine the number of candidate regions based on using a threshold criteria. One skilled in the art would have been motivated to modify Fukagai in this manner in order to detect the predetermined object faster and more accurately. (Sakai, Abstract)
Regarding Claim 12, the combination of Fukagai and Sakai teaches the detection apparatus according to Claim 1, wherein the feature is obtained by using a pre-generated neural network (Fukagai, ¶[0066], An image recognition apparatus 100 according to the second embodiment uses a neural network to implement image recognition for determining the location and class of each object in an input image. The examiner interprets that all the image processing being performed in the prior art is done by a neural network.)
Regarding Claim 13, Fukagai teaches a detecting method comprising:  features from an image(¶[0087] The CNN layer 33 generates the feature map 34 from the input image 32 using a CNN with weights preliminarily obtained through training. The CNN layer 33 has a deep neural network including thirteen convolutional layers, four pooling layers, and three fully connected layers. Each convolutional layer extracts information on features, such as edges, using predetermined filters; determining candidate regions of an object in the image based on the extracted features([0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The higher the score, the higher the probability of the desired object being present. The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets that the neural network of the prior art is using the features extracted in the features map to determine the number of rectangular shaped regions and to place those rectangular shaped regions at the location of the object. The extracted features of feature map are used to calculate a score for each object candidate region and the RPN layer removes overlapping regions to generate a determined number of regions to be 300 object candidate regions.); detecting a position of the object from the image based on at least the extracted features and the candidate regions(¶[0101] The object predicting unit 136 corresponds to the aforementioned fast R-CNN layer 37. Based on the score-associated location data output from the object candidate region generating unit 135 and the feature map output from the feature map generating unit 134, the object predicting unit 136 determines each rectangular region with an object and the class of the object, and calculates a score indicating the credibility measure of the determination. The examiner interprets that based on the score from each rectangular shaped region used to detect the location of the object as shown in Fig. 5, that the prior art is able to calculate and determine the type of object present and where it is located in the image.),Fukagai does not explicitly teach wherein a number of the candidate regions corresponding to a portion of the image is determined as N in a case that a feature corresponding to the portion of the image is equal or greater than a first threshold value, and as M(N>M) in a case the feature corresponding to the portion of the image is smaller than the first threshold value.
Sakai teaches wherein a number of the candidate regions corresponding to a portion of the image is determined as N in a case that a feature corresponding to the portion of the image is equal or greater than a first threshold value, ([0018], Here, the region integrating unit may determine candidate regions having top predetermined number of greatest weighted average scores (including average scores) to be related regions. Alternatively, the region integrating unit may determine all the candidate regions having average scores greater than or equal to a threshold to be related regions without limiting the number of candidate regions. ¶[0019] In a case where a target object is included in an image, the first detection unit may determine that a plurality of candidate regions is detected near the target object. By determining related regions as described above, a plurality of candidate regions detected for one target object can be determined to be related regions. The examiner interprets the prior art is determining the number of candidate regions based on the scores of the potential object being in the candidate region is greater than or equal to a threshold.) and as M(N>M) in a case the feature corresponding to the portion of the image is smaller than the first threshold value. (¶[0069], Then, the candidate regions 411 and 413 close to the representative region 412 have a relation score greater than or equal to a threshold. However the relation scores of the other regions are lower than the threshold. Therefore, it is judged that the candidate regions 411 and 413 are related regions of the representative region 412, and the candidate regions 411 to 413 form one group 421 as illustrated in FIG. 7B. Then, according to the candidate regions 411 to 413, one integrated region is determined. The examiner interprets that the number of candidate regions for a case where N>M is determined by if the related scores are lower than the threshold the regions form a group until one region is determined.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Fukagai with Sakai in order to determine the number of candidate regions based on using a threshold criteria. One skilled in the art would have been motivated to modify Fukagai in this manner in order to detect the predetermined object faster and more accurately. (Sakai, Abstract)
Regarding Claim 15, the combination of Fukagai and Sakai teaches the detecting method according to Claim 13, wherein the candidate regions contain a plurality of shapes, and the portion of the image is allocated the plurality of shapes per the candidate region(Fukagai,¶[0088], Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The examiner interprets that the object candidate regions are differently sized rectangular shaped regions and they are allocated based on regions that contain the target object.)
Regarding Claim 16, the combination of Fukagai and Sakai teaches the detecting method according to Claim 15, wherein a number of shapes contained in the candidate region for the portion of the image is determined according to the feature corresponding to the portion of the image. (Fukagai, ¶[0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The examiner interprets that the shape distribution of the candidate region is dependent on the extracted features of the object. As seen in Fig. 5 of the prior art there are different sizes of rectangular shaped boxes around each object detected and the RPN layer will determine the number of candidate regions to place on the object for detection.);
Regarding Claim 17, the combination of Fukagai and Sakai teaches The detecting method according to Claim 13, wherein a total number of the candidate regions allocated to the image is a predefined value, (Fukagai, ¶[0088], the RPN layer 35 selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets the predefined value to be the top 6000 candidate regions and the determined number to be the final output of candidate regions which is 300), and the number of the candidate regions corresponding to the portion of the image is determined further based on the predefined value. (Fukagai, As seen in figure 5, the position distribution of the candidate region is based on the predefined value since the predefined value was the top 6000 object candidate regions when it came to score. The neural network is using that predefined value of regions in order reduce the overlap between regions such that it can have a final determined amount of candidate object regions which is 300.).
Regarding Claim 18, the combination of Fukagai and Sakai teaches the detecting method according to Claim 13, wherein the position of the object is extracted from the image by executing a regression operation based on the extracted features and the determined number, position and shape of the candidate regions. (Fukagai, ¶[0089] The fast R-CNN layer 37 determines, based on the feature map 34 and the object candidate regions 36, the class of an object captured in each object candidate region and calculates a score indicating the credibility measure of the determination result. The fast R-CNN layer 37 selects a small number of image regions with sufficiently high scores. The recognition results 38 output from the fast R-CNN layer 37 include, for each selected image region, the location of the image region, the determined object class, and the score indicating the probability that the object is the particular class. The examiner interprets that neural network is generating a probability or correlation score based on the extracted features of the object in the rectangular shaped region as shown in figure 5 to be the regression operation).
Regarding Claim 19, the combination of Fukagai and Sakai teaches the detecting method according to claim 13, wherein the feature is obtained by using a pre-generated neural network. (¶[0066], An image recognition apparatus 100 according to the second embodiment uses a neural network to implement image recognition for determining the location and class of each object in an input image. The examiner interprets that all the image processing being performed in the prior art is done by a neural network.).
Regarding Claim 20, Fukagai teaches an image processing apparatus comprising: an acquisition device which acquires an image or video(¶[0097], The input image may be captured by an image pickup device coupled to the image recognition apparatus 100.); a storage device which stores an instruction([0069], The RAM 102 is volatile semiconductor memory for temporarily storing therein programs to be executed by the CPU 101 and data to be used by the CPU 101 for its computation); and a processor which executes the instruction based on the acquired image or video, such that the processor at least implements the following steps(¶[0069] The CPU 101 is a processor configured to execute program instructions. The CPU 101 reads out at least part of programs and data stored in the HDD 103, loads them into the RAM 102, and executes the loaded programs.): ¶[0087] The CNN layer 33 generates the feature map 34 from the input image 32 using a CNN with weights preliminarily obtained through training. The CNN layer 33 has a deep neural network including thirteen convolutional layers, four pooling layers, and three fully connected layers. Each convolutional layer extracts information on features, such as edges, using predetermined filters; determine candidate regions of an object in the image based on the extracted features ([0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The higher the score, the higher the probability of the desired object being present. The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets that the neural network of the prior art is using the features extracted in the features map to determine the number of rectangular shaped regions and to place those rectangular shaped regions at the location of the object. The extracted features of feature map are used to calculate a score for each object candidate region and the RPN layer removes overlapping regions to generate a determined number of regions to be 300 object candidate regions.); detecting a position of the object from the image based on at least the extracted features and the candidate regions(¶[0101] The object predicting unit 136 corresponds to the aforementioned fast R-CNN layer 37. Based on the score-associated location data output from the object candidate region generating unit 135 and the feature map output from the feature map generating unit 134, the object predicting unit 136 determines each rectangular region with an object and the class of the object, and calculates a score indicating the credibility measure of the determination. The examiner interprets that based on the score from each rectangular shaped region used to detect the location of the object as shown in Fig. 5, that the prior art is able to calculate and determine the type of object present and where it is located in the image.),
Fukagai does not explicitly teach wherein a number of the candidate regions corresponding to a portion of the image is determined as N in a case that a feature corresponding to the portion of the image is equal or greater than a first threshold value, and as M(N>M) in a case the feature corresponding to the portion of the image is smaller than the first threshold value.
Sakai teaches wherein a number of the candidate regions corresponding to a portion of the image is determined as N in a case that a feature corresponding to the portion of the image is equal or greater than a first threshold value, ([0018], Here, the region integrating unit may determine candidate regions having top predetermined number of greatest weighted average scores (including average scores) to be related regions. Alternatively, the region integrating unit may determine all the candidate regions having average scores greater than or equal to a threshold to be related regions without limiting the number of candidate regions. ¶[0019] In a case where a target object is included in an image, the first detection unit may determine that a plurality of candidate regions is detected near the target object. By determining related regions as described above, a plurality of candidate regions detected for one target object can be determined to be related regions. The examiner interprets the prior art is determining the number of candidate regions based on the scores of the potential object being in the candidate region is greater than or equal to a threshold.) and as M(N>M) in a case the feature corresponding to the portion of the image is smaller than the first threshold value. (¶[0069], Then, the candidate regions 411 and 413 close to the representative region 412 have a relation score greater than or equal to a threshold. However the relation scores of the other regions are lower than the threshold. Therefore, it is judged that the candidate regions 411 and 413 are related regions of the representative region 412, and the candidate regions 411 to 413 form one group 421 as illustrated in FIG. 7B. Then, according to the candidate regions 411 to 413, one integrated region is determined. The examiner interprets that the number of candidate regions for a case where N>M is determined by if the related scores are lower than the threshold the regions form a group until one region is determined.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Fukagai with Sakai in order to determine the number of candidate regions based on using a threshold criteria. One skilled in the art would have been motivated to modify Fukagai in this manner in order to detect the predetermined object faster and more accurately. (Sakai, Abstract)
Regarding Claim 21, , Fukagai teaches an image processing system comprising: an acquisition apparatus which acquires an image or video(¶[0097], The input image may be captured by an image pickup device coupled to the image recognition apparatus 100.);a detection apparatus for detecting an object from the acquired image or 5video by extracting features from the image(¶[0087] The CNN layer 33 generates the feature map 34 from the input image 32 using a CNN with weights preliminarily obtained through training. The CNN layer 33 has a deep neural network including thirteen convolutional layers, four pooling layers, and three fully connected layers. Each convolutional layer extracts information on features, such as edges, using predetermined filters. As seen in Fig.2 of the prior art shows a CPU 101 and in ¶[0069] The CPU 101 is a processor configured to execute program instructions. The CPU 101 reads out at least part of programs and data stored in the HDD 103, loads them into the RAM 102, and executes the loaded programs. The examiner interprets the CPU is storing the neural network and the CPU is capable of extracting features from the input image.); determine candidate regions of an object in the image based on the extracted features ([0088] Using the feature map 34, the RPN layer 35 detects, from the input image 32, the object candidate regions 36 which are image regions likely to contain objects to be detected. The RPN layer 35 sets, as object candidate regions, a large number of rectangular regions of different sizes at different locations on the input image 32, and calculates a score for each object candidate region based on the feature map 34. The score may also be referred to as an evaluation value or credibility measure. The score indicates the probability of a desired object being present in the corresponding object candidate region. The higher the score, the higher the probability of the desired object being present. The RPN layer 35 first extracts more than 6000 object candidate regions and selects the top 6000 object candidate regions in descending order of the scores. Then, the RPN layer 35 removes object candidate regions with high degree of overlapping to finally output 300 object candidate regions. The examiner interprets that the neural network of the prior art is using the features extracted in the features map to determine the number of rectangular shaped regions and to place those rectangular shaped regions at the location of the object. The extracted features of feature map are used to calculate a score for each object candidate region and the RPN layer removes overlapping regions to generate a determined number of regions to be 300 object candidate regions.); detecting a position of the object from the image based on at least the extracted features and the candidate regions(¶[0101] The object predicting unit 136 corresponds to the aforementioned fast R-CNN layer 37. Based on the score-associated location data output from the object candidate region generating unit 135 and the feature map output from the feature map generating unit 134, the object predicting unit 136 determines each rectangular region with an object and the class of the object, and calculates a score indicating the credibility measure of the determination. The examiner interprets that based on the score from each rectangular shaped region used to detect the location of the object as shown in Fig. 5, that the prior art is able to calculate and determine the type of object present and where it is located in the image.),
and a processing apparatus which executes a subsequent image processing operation based on the detected object(¶[0071] The image signal processing unit 105 produces video images in accordance with drawing commands from the CPU 101 and displays them on a screen of a display 111 coupled to the image recognition apparatus 100. The examiner interprets that the image signal processing unit is processing of the image by taking the video images from the CPU and displaying them on to a display showing the recognition results of the image recognition.), wherein the acquisition apparatus, the detection apparatus and the processing apparatus are connected each other via a network (¶[0097], The input image may be captured by an image pickup device coupled to the image recognition apparatus 100. Alternatively, the input image may be input to the image recognition apparatus 100 by the user, or sent to the image recognition apparatus 100 from a different information processor via the network 114. The examiner interprets that the image pick up device to be the acquisition apparatus, the image recognition apparatus to be the detection apparatus and the different information processor to be the processing apparatus which are all connected via network. ).
Fukagai does not explicitly teach wherein a number of the candidate regions corresponding to a portion of the image is determined as N in a case that a feature corresponding to the portion of the image is equal or greater than a first threshold value, and as M(N>M) in a case the feature corresponding to the portion of the image is smaller than the first threshold value.
Sakai teaches wherein a number of the candidate regions corresponding to a portion of the image is determined as N in a case that a feature corresponding to the portion of the image is equal or greater than a first threshold value, ([0018], Here, the region integrating unit may determine candidate regions having top predetermined number of greatest weighted average scores (including average scores) to be related regions. Alternatively, the region integrating unit may determine all the candidate regions having average scores greater than or equal to a threshold to be related regions without limiting the number of candidate regions. ¶[0019] In a case where a target object is included in an image, the first detection unit may determine that a plurality of candidate regions is detected near the target object. By determining related regions as described above, a plurality of candidate regions detected for one target object can be determined to be related regions. The examiner interprets the prior art is determining the number of candidate regions based on the scores of the potential object being in the candidate region is greater than or equal to a threshold.) and as M(N>M) in a case the feature corresponding to the portion of the image is smaller than the first threshold value. (¶[0069], Then, the candidate regions 411 and 413 close to the representative region 412 have a relation score greater than or equal to a threshold. However the relation scores of the other regions are lower than the threshold. Therefore, it is judged that the candidate regions 411 and 413 are related regions of the representative region 412, and the candidate regions 411 to 413 form one group 421 as illustrated in FIG. 7B. Then, according to the candidate regions 411 to 413, one integrated region is determined. The examiner interprets that the number of candidate regions for a case where N>M is determined by if the related scores are lower than the threshold the regions form a group until one region is determined.).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the claimed invention as taught by Fukagai with Sakai in order to determine the number of candidate regions based on using a threshold criteria. One skilled in the art would have been motivated to modify Fukagai in this manner in order to detect the predetermined object faster and more accurately. (Sakai, Abstract)
Regarding Claim 22, the combination of Fukagai and Sakai teaches a non-transitory computer-readable storage medium storing instructions that, when executed by a computer, cause the computer to perform the detecting method according to claim 13 (¶[0069] The CPU 101 is a processor configured to execute program instructions. The CPU 101 reads out at least part of programs and data stored in the HDD 103, loads them into the RAM 102, and executes the loaded programs. The RAM 102 is volatile semiconductor memory for temporarily storing therein programs to be executed by the CPU 101 and data to be used by the CPU 101 for its computation).
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Fukagai et al. US PG-Pub(US 20200074690 A1) in view of Sakai US PG-Pub(US 20210357708 A1) in view of El-Khamy et al. US PG-Pub (US 20180089505 A1).
Regarding Claim 8, while Fukagai and Sakai teaches the detection apparatus according to claim 1, they do not explicitly teach wherein for objects having different scales in the image, the features are extracted with different levels respectively from the image; wherein for each feature among the extracted features with different levels
El-Khamy teaches wherein for objects having different scales in the image, the features are extracted with different levels respectively from the image ([0045], a primary object detector includes a feed forward convolutional network with features extracted at multiple convolutional scales and resolutions. The examiner interprets the object detector in the prior art is using a neural network to extract features at multiple scales and resolutions of the image.); 25wherein for each feature among the extracted features with different levels ([0045], Bounding box candidates of different sizes and aspect ratios at each location of the extracted features are further classified as an object or background in the captured image, and localization offsets of the candidate bounding boxes are calculated by bounding box regressions. The examiner interprets after the features are extracted the prior art using a bounding box around the extract features to determine if there is an object present which is similar functionality of what the determination unit and detection unit is doing in the instant application).
It would have been obvious at the time of filing to one of ordinary skill in the art to add the teaching of El-Khamy to Fukagai and Sakai in order to extract features at different scales of the image. One skilled in the art would have been motivated to modify Fukagai and Sakai in this manner in order to perform fast and robust object detection. (El-Khamy, ¶[0002])
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAN D HOANG whose telephone number is (571)272-4344. The examiner can normally be reached Monday-Friday 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire X. Wang can be reached on (571) 270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAN HOANG/Examiner, Art Unit 2663                 

/CLAIRE X WANG/Supervisory Patent Examiner, Art Unit 2663