DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-4, 6-12, and 14-17 are pending. Claims 5 and 13 are canceled. 
Specification
With respect to the specification, Applicant has amended Paras. 0003 and 0043 in the specification to correct for minor informalities. Therefore, the objections have been withdrawn. The amended specification has been entered. 
Claim Objections
With respect to the claims, Applicant has amended claims 3, 8, 11 and 16 to correct for minor informalities. Therefore, the objections to claims 3, 8, 11 and 16 have been withdrawn. The amended claims have been entered. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-4, 7-12, and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Lou et al. (CN 109410261 A, see attached machine translation) in view of Zheng et al. (CN 109889724 A, see attached machine translation) and further in view of “IMAGEJ Image Processing and Practice” by Shuifa et al. (see attached machine translation).
Regarding claim 1, Lou et al. teaches, a method for processing an image, comprising: obtaining an image by a monocular camera (Abstract: the invention discloses a monocular image depth estimation method based on a pyramid pooling module; Abstract: each original monocular image in the training set is used as the original input image; Note: a monocular camera is needed to obtain the monocular image); 
extracting image features with different levels based on the image (As seen in Pg. 3, fourth paragraph, there are multiple feature extraction network blocks and multiple convolution layers (i.e. different levels); Pg. 7, first advantage of the present invention listed: the four feature extraction network blocks in the feature extraction network framework are composed of two residual networks: Conv block and Identity block. Block composition, that is, the method of the present invention extracts features by using a combination of residual network blocks,…and utilizes the pooled blocks in the pyramid pooling module);
 determining a fused feature by fusing the image features with different levels (Pg. 7, fourth advantage of the present invention listed: the method of the invention utilizes the feature extraction capability of the feature extraction network framework and the pyramid pooling module, fully utilizes the high-level and low-level feature information, and considers the fusion feature information from multiple scales, thereby obtaining better prediction results and improving the depth estimation; Pg. 9, second paragraph: the two network blocks of Conv block and Identity block effectively ensure the depth of the neural network by margining the feature information of different levels on the input feature map; Pg. 9, third paragraph: the Add fusion layer implements the operation of the Conv block; the Identity block is mainly composed 
determining a depth distribution feature map of the image based on the fused feature, wherein a pixel value of each pixel point in the depth distribution feature map is a depth value (Pg. 7, fourth advantage of the present invention listed: the method of the invention utilizes the feature extraction capability of the feature extraction network framework and the pyramid pooling module, fully utilizes the high-level and low-level feature information, and considers the fusion feature information from multiple scales, thereby obtaining better prediction results and improving the depth estimation; Pg. 9, second paragraph: the two network blocks of Conv block and Identity block effectively ensure the depth of the neural network by margining the feature information of different levels on the input feature map; Pg. 9, third paragraph: the Add fusion layer implements the operation of the Conv block; the Identity block is mainly composed of three convolution layers and one existing Add fusion layer, and the convolution kernel size of the first convolutional layer and the third convolutional layer. 1×1, the convolution kernel size of the second convolutional layer is 3×3, and the input and input of the third convolutional layer are merged through the existing Add fusion layer. The operation of the Identity block is implemented. The Identity block mainly expands the number of output feature maps by blending feature maps). 

However, Zheng et al. teaches, obtaining a first depth value of a selected focusing point and a second depth value of a pixel point to be blurred of the depth distribution feature map (Para. 0191: generate M blurred images with successively deepened blur degrees for the target image according to the depth image, where M is a natural number; for each pixel in the target image, according to the depth image obtains the pixel position depth of the pixel point and the first distance of the focus depth; according to the first distance, obtains the target blurred image corresponding to the pixel point from the M blurred images; according to the target The blurred image determines the blurred pixel value of the pixel point; As shown in Para. 0191, the focus depth value (i.e. first depth value) and the pixel position depth value of the pixel point (i.e. second depth value) are determined for each pixel in the target image to be blurred (i.e. creates a distribution of depth values). Para. 0131 further shows that the focus depth is selected (i.e. if the focus depth is 3m) as well as the pixel depths);
determining an absolute difference between the first depth value and the second depth Para. 0191 shows that the blurred pixel value of the pixel point is determined according to the blurred image and a difference between the pixel depth positions (i.e. absolute difference) is determined);
determining a blurred radius based on the absolute difference (Para. 0138: for the pixels corresponding to multiple target blurred images, the interpolation weight of the target blurred image corresponding to the corresponding pixel with a higher degree of blur can be set to the first distance corresponding to the corresponding pixel and the largest positive integer less than the first distance. The interpolation weight of the target blurred image with a low degree of blurring corresponding to the corresponding pixel is the absolute value of the difference between the first distance corresponding to the corresponding pixel and the smallest positive integer greater than the first distance; As shown in Para. 0138, a first distance is determined when blurring the image. The first distance is used to determine the absolute difference (i.e. absolute value of the difference between the first distance…and the smallest positive integer greater than the first distance). Therefore, the first difference is a distance measurement (i.e. distance or radius of 1 meter) and not a distance difference); 
wherein the blurred radius is positively correlated with the absolute difference (Para. 0138: for the pixels corresponding to multiple target blurred images, the interpolation weight of the target blurred image corresponding to the corresponding pixel with a higher degree of blur can be set to the first distance corresponding to the corresponding pixel and the largest positive integer less than the first distance. The interpolation weight of the target blurred image with a low degree of blurring corresponding to the corresponding pixel is the absolute value of the difference between the first distance corresponding to the corresponding pixel and the smallest positive integer greater than the first distance; Note: the larger the blurred radius (i.e. distance) is, the more blur there is; As shown in Para. 0138 there is a positive relationship between the blurred radius and absolute difference in that there is more blur (i.e. higher degree of blur of the absolute difference) when the first distance is larger).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include depth values of a pixel to be blurred, determining an absolute difference between the depth values, and determining a blurred radius according to the absolute difference as taught by Zheng et al. into the image processing of Lou et al. in order to improve prediction accuracy (Zheng et al., Para. 0147).
The combination of Lou et al. and Zheng et al. does not expressly disclose the following limitations: determining a blurred kernel based on the blurred radius and a pre-selected convolution kernel; and blurring respective pixel points in an area of the depth distribution feature map based on the blurred kernel, wherein the area is an area with the pixel point to be blurred as a reference and the blurred radius as a radius.
As shown in Pgs. 2-3, different kernels are used to convolve the images in which a Gaussian blur is selected (i.e. pre-selected convolution kernel); On Pg. 5, the Gaussian blur has a radius that is set in the radius window. The standard deviation or sigma is further specified as the radius in FIG. 11-2-3); 
and blurring respective pixel points in an area of the depth distribution feature map based on the blurred kernel, wherein the area is an area with the pixel point to be blurred as a reference and the blurred radius as a radius (Pg. 2, second paragraph: using various types of "kernels" to convolve with images to get the effect we want; Pg. 2-3: so if the Gaussian filter is convolved with the original image, the resulting image will become blurred, which is called Gaussian Blur in ImageJ. The specific operation of Gaussian filtering is to scan each pixel in the image with a template (or convolution, mask), and use the weighted average gray value of the pixels in the neighborhood determined by the template to replace the value of the center pixel On Pgs. 2-3 that each pixel is blurred by the Gaussian filter in an area with a set standard deviation (i.e. area of distribution) and the standard deviation or sigma in FIG. 11-2-5 consists of specifying x and y values (i.e. radii) for the Gaussian blur).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include determining a blurred kernel based on the blurred radius as taught by Shuifa et al. into the combined image processing of Lou et al. and Zheng et al. in order to improve the blur degree (Shuifa et al., Pg. 3: the larger standard deviation, the more blur there is (and vice versa)).
Regarding claim 2, the combination of Lou et al., Zheng et al., and Shuifa et al. teaches the limitations as explained in claim 1 above.
Lou et al. further teaches, wherein said determining the fused feature comprises: determining a super-high-level image feature by convoluting the image feature with the highest level through convolution kernels in multiple sizes (As seen in the abstract, a neural network model is used for monocular image depth estimation. Note: neural networks require a computer to carry out training of the images in which the computer contains a memory and a processor; Pg. 7, fourth advantage of the present invention listed: the method of the invention utilizes the feature extraction capability of the feature extraction network framework and the 
and determining the fused feature by fusing the image features and the super-high-level image feature (Pg. 9, second paragraph: the two network blocks of Conv block and Identity block effectively ensure the depth of the neural network by merging the feature information of different levels on the input feature map; Pg. 7, fourth advantage of the present invention listed: the method of the invention utilizes the feature extraction capability of the feature extraction network framework and the pyramid pooling module, fully utilizes the high-level and low-level feature information, and considers the fusion feature information from multiple scales; Note: the fused feature includes the merged feature information of different levels (i.e. fusing the image features) and the high-level feature information (i.e. super-high-level image feature) from the pyramid module).
Regarding claim 3, the combination of Lou et al., Zheng et al., and Shuifa et al. teaches the limitations as explained in claim 2 above.
Zheng et al. further teaches, wherein said determining the fused feature by fusing the image features and the super-high-level image feature comprises: fusing the image features in order of levels of the image features from highest to lowest (Para. 0027: the depth prediction model sequentially includes a basic model, a multi-scale model, a feature fusion layer, and a prediction output layer; Para. 0030: the feature fusion layer is used to restore the resolution of the current input image and reduce the number of channels, and to fuse the features output by the basic model; Para. 0143: the basic model is used to extract the features of the current input image to provide features for the Multi-Scale Model; and the features can include but are not limited to the features from the bottom to the high level of the corresponding image. For example, the bottom features can include the edges of the image, Corner points, texture and color information, etc.; middle-level features can include image geometric information, such as circles, rectangles, triangles, and other structures; high-level features can include image semantic information, such as people, buildings, sky, and so on; As seen in Fig. 3, the highest features are fused first (i.e. Conv2d(128- >64)) followed by the lowest features (i.e. Conv2d(32- > 16)); 
wherein fusing the super-high-level image feature and the image feature with the highest level firstly (Para. 0146: the feature fusion layer is used to restore the resolution of the current input image and reduce the number of channels, and to fuse the features output by the basic model, so that the features from the bottom layer to the high layer can be considered; As seen in Fig. 3, the highest features are fused first (i.e. Conv2d(128- >64)) followed by the lowest 
and wherein fusing the image features based on a result of the previous said fusing the image features, and the image feature with a corresponding level (Para. 0147: an additive fusion method can be used, which can reduce the amount of calculation. In this process, the part enlarged from the small image is from the high-level features, and the corresponding layer before the addition is the low-level feature. Therefore, this method uses both the low-level feature and the high-level feature to ensure that useful information is not lost; As seen in Fig. 3, features extracted by the convolutional network layer in the base model can be input to the feature fusion convolutional network layer (i.e. feature fuse layer) to obtain an output result (i.e. prediction layer));
 wherein times of said fusing the image features are smaller than or equal to a total number of the different levels, and a resolution of the fused feature is the same as a resolution of the image (Para. 0146: the feature fusion layer is used to restore the resolution of the current input image and reduce the number of channels, and to fuse the features output by the basic model, so that the features from the bottom layer to the high layer can be considered; Note: one of ordinary skill in the art can think of the resolution of the processing result of setting the last feature fusion operation prior to prediction is the same as the resolution of the image to be processed, and that more fused features are obtained by setting the number of fusion operations equal to the total number of levels of image features to improve accuracy).
Regarding claim 4, the combination of Lou et al., Zheng et al., and Shuifa et al. teaches the limitations as explained in claim 2 above.
Zheng et al. further teaches, further comprising: determining the depth distribution 
Regarding claim 7, the combination of Lou et al., Zheng et al., and Shuifa et al. teaches the limitations as explained in claim 1 above.
Zheng et al. further teaches, wherein said determining the blurred radius further comprises: receiving a second operation instruction for the blurred radius, wherein the second operation instruction comprises an aperture value (Para. 0138: for the pixels corresponding to multiple target blurred images, the interpolation weight of the target blurred image corresponding to the corresponding pixel with a higher degree of blur can be set to the first distance corresponding to the corresponding pixel and the largest positive integer less than the first distance. The interpolation weight of the target blurred image with a low degree of blurring corresponding to the corresponding pixel is the absolute value of the difference 
and determining the blurred radius based on the aperture value and the determined blurred radius (Para. 0157: blur represents the aperture value, the aperture range is 0~1, the larger the value, the larger the aperture value, the larger the aperture Larger, the shallower the depth of field, the more blurred the background; Note: the greater the blur radius, the greater the degree of blur, thus the combination of the two together controls the blur operation, both of which have a positive degree of blur, so that a new radius of blur can be obtained by multiplying the two).
Regarding claim 8, the combination of Lou et al., Zheng et al., and Shuifa et al. teaches the limitations as explained in claim 1 above.
Shuifa et al. further teaches, wherein convolution kernels are in at least two of the following shapes: heart shape, pentagram, circle, pentagon, and butterfly shape (Pg. 3: using various types of "kernels" to convolve with images to get the effect we want; Note: those 
Regarding claim 9, Lou et al. teaches, a device for processing an image, comprising: a memory and at least one processor, wherein the at least one processor is configured to read and execute instructions stored in the memory to (As seen in the abstract, a neural network model is used for monocular image depth estimation. Note: neural networks require a computer (i.e. device) to carry out training of the images in which the computer contains a memory and a processor): 
obtain an image by a monocular camera (Abstract: each original monocular image in the training set is used as the original input image; Note: a monocular camera is needed to obtain the monocular image);
extract image features with different levels based on the image (As seen in Pg. 3, fourth paragraph, there are multiple feature extraction network blocks and multiple convolution layers (i.e. different levels); Pg. 7, first advantage of the present invention listed: the four feature extraction network blocks in the feature extraction network framework are composed of two residual networks: Conv block and Identity block. Block composition, that is, the method of the present invention extracts features by using a combination of residual network blocks,…and utilizes the pooled blocks in the pyramid pooling module); 
determine a fused feature by fusing the image features with different levels (Pg. 7, fourth advantage of the present invention listed: the method of the invention utilizes the feature extraction capability of the feature extraction network framework and the pyramid pooling module, fully utilizes the high-level and low-level feature information, and considers 
determine a depth distribution feature map of the image based on the fused feature, wherein a pixel value of each pixel point in the depth distribution feature map is a depth value (Pg. 3, fifth paragraph: for the feature extraction network framework, the input end of the first feature extraction network block receives all the feature maps in P1, and the output end of the first feature extraction network block outputs a K' amplitude feature map, and the set of K' amplitude feature maps is formed; Pg. 6, first paragraph: for the output layer, it comprises a convolution layer, wherein the convolution layer has a convolution kernel size of 3×3, the activation function adopts a linear rectification function; the input end of the output layer receives all the feature maps in C, and the output layer The output end outputs a predicted depth image corresponding to the original input image; Pg. 7, fourth advantage of the present invention listed: the method of the invention utilizes the feature extraction capability of the 
Lou et al. does not expressly disclose the following limitations: obtain a first depth value of a selected focusing point and a second depth value of a pixel point to be blurred of the depth distribution feature map; determine an absolute difference between the first depth value and the second depth value; determine a blurred radius based on the absolute difference; wherein the blurred radius is positively correlated with the absolute difference; determine a blurred kernel based on the blurred radius and a pre-selected convolution kernel; and blur respective pixel points in an area of the depth distribution feature map based on the blurred kernel, wherein the area is an area with the pixel point to be blurred as a reference and the blurred radius as a radius.
However, Zheng et al. teaches, obtain a first depth value of a selected focusing point and a second depth value of a pixel point to be blurred of the depth distribution feature map (Para. 0194: the device 500 may include one or more of the following components: a processing As shown in Para. 0191, the focus depth value (i.e. first depth value) and the pixel position depth value of the pixel point (i.e. second depth value) are determined for each pixel in the target image to be blurred (i.e. creates a distribution of depth values). Para. 0131 further shows that the focus depth is selected (i.e. if the focus depth is 3m) as well as the pixel depths); 
determine an absolute difference between the first depth value and the second depth value (Para. 0191: according to the first distance, obtains the target blurred image corresponding to the pixel point from the M blurred images; according to the target The blurred image determines the blurred pixel value of the pixel point; Para. 0126: Gaussian blur may be used to generate M blurred images with successively deepened blur; Para. 0138: for example, for the above 6 blurred images, in the order of deepening of the blur program, they are P1, P2, P3, P4, P5, and P6. The pixel with the first distance of 1 meter corresponds to P1, and the first distance is 2 meters. The pixels of corresponds to P2, and the pixels of the first distance greater than 1 meter and less than 2 meters correspond to P1 and P2, and so on; Para. 0191 shows that the blurred pixel value of the pixel point is determined according to the blurred image and a difference between the pixel depth positions (i.e. absolute difference) is determined); 
determine a blurred radius based on the absolute difference (Para. 0138: for the pixels corresponding to multiple target blurred images, the interpolation weight of the target blurred image corresponding to the corresponding pixel with a higher degree of blur can be set to the first distance corresponding to the corresponding pixel and the largest positive integer less than the first distance. The interpolation weight of the target blurred image with a low degree of blurring corresponding to the corresponding pixel is the absolute value of the difference between the first distance corresponding to the corresponding pixel and the smallest positive integer greater than the first distance; As shown in Para. 0138, a first distance is determined when blurring the image. The first distance is used to determine the absolute difference (i.e. absolute value of the difference between the first distance…and the smallest positive integer greater than the first distance). Therefore, the first difference is a distance measurement (i.e. distance or radius of 1 meter) and not a distance difference); 
wherein the blurred radius is positively correlated with the absolute difference (Para. 0138: for the pixels corresponding to multiple target blurred images, the interpolation weight of the target blurred image corresponding to the corresponding pixel with a higher degree of blur can be set to the first distance corresponding to the corresponding pixel and the largest positive integer less than the first distance. The interpolation weight of the target blurred image with a low degree of blurring corresponding to the corresponding pixel is the absolute value of the difference between the first distance corresponding to the corresponding pixel and the smallest positive integer greater than the first distance; Note: the larger the blurred radius (i.e. distance) is, the more blur there is; As shown in Para. 0138 there is a positive relationship between the blurred radius and absolute difference in that there is more blur (i.e. higher degree of blur of the absolute difference) when the first distance is larger).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include depth values of a pixel to be blurred, determining an absolute difference between the depth values, and determining a blurred radius according to the absolute difference as taught by Zheng et al. into the image processing of Lou et al. in order to improve prediction accuracy (Zheng et al., Para. 0147).
The combination of Lou et al. and Zheng et al. does not expressly disclose the following limitations underlined above: determine a blurred kernel based on the blurred radius and a pre-selected convolution kernel; and blur respective pixel points in an area of the depth distribution feature map based on the blurred kernel, wherein the area is an area with the pixel point to be blurred as a reference and the blurred radius as a radius.
However, Shuifa et al. teaches, determine a blurred kernel based on the blurred radius and a pre-selected convolution kernel (Pg. 2, second paragraph: using various types of "kernels" to convolve with images to get the effect we want; Pg. 2-3: so if the Gaussian filter is convolved with the original image, the resulting image will become blurred, which is called Gaussian Blur in ImageJ. The specific operation of Gaussian filtering is to scan each pixel in the image with a template (or convolution, mask), and use the weighted average gray value of the pixels in the neighborhood determined by the template to replace the value of the center pixel of the template; Pg. 3: using Gaussian blur, users can set the standard deviation of the Gaussian function independently. The larger the standard deviation, the more serious the blur degree, as shown in Figure 11-2-4; Note: the standard deviation is the radius; As shown in Pgs. 2-3, different kernels are used to convolve the images in which a Gaussian blur is selected (i.e. pre-selected convolution kernel); On Pg. 5, the Gaussian blur has a radius that is set in the radius window. The standard deviation or sigma is further specified as the radius in FIG. 11-2-3); 
and blur respective pixel points in an area of the depth distribution feature map based on the blurred kernel, wherein the area is an area with the pixel point to be blurred as a reference and the blurred radius as a radius (Pg. 2, second paragraph: using various types of "kernels" to convolve with images to get the effect we want; Pg. 2-3: so if the Gaussian filter is convolved with the original image, the resulting image will become blurred, which is called Gaussian Blur in ImageJ. The specific operation of Gaussian filtering is to scan each pixel in the image with a template (or convolution, mask), and use the weighted average gray value of the pixels in the neighborhood determined by the template to replace the value of the center pixel of the template; Pg. 3: using Gaussian blur, users can set the standard deviation of the Gaussian function independently. The larger the standard deviation, the more serious the blur degree, as shown in Figure 11-2-4; Note: if the three-dimensional map is blurred, the standard deviation is set for each direction in FIG. 11-2-5. As can be seen from the analysis of the following figures, the standard deviation is the radius; On Pgs. 2-3 that each pixel is blurred by the Gaussian filter in an area with a set standard deviation (i.e. area of distribution) and the standard deviation or sigma in FIG. 11-2-5 consists of specifying x and y values (i.e. radii) for the Gaussian blur).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include determining a blurred kernel 
Regarding claim 10, the combination of Lou et al., Zheng et al., and Shuifa et al. teaches the limitations as explained in claim 9 above. 
Lou et al. further teaches, wherein the at least one processor is further configured to read and execute instructions stored in the memory to: determine a super-high-level image feature by convoluting the image feature at the highest level through convolution kernels in multiple sizes (As seen in the abstract, a neural network model is used for monocular image depth estimation. Note: neural networks require a computer to carry out training of the images in which the computer contains a memory and a processor; Pg. 7, fourth advantage of the present invention listed: the method of the invention utilizes the feature extraction capability of the feature extraction network framework and the pyramid pooling module, fully utilizes the high-level and low-level feature information, and considers the fusion feature information from multiple scales, thereby obtaining better prediction results and improving the depth estimation; Pg. 9, second paragraph: the two network blocks of Conv block and Identity block effectively ensure the depth of the neural network by merging the feature information of different levels on the input feature map, which is beneficial to feature extraction; the Conv block is mainly composed of 4 convolution layers (including The first convolutional layer of the main branch and the one convolutional layer of the side branch are respectively formed as the first convolutional layer to the fourth convolutional layer, and one existing Add fusion layer, and the first one of the main branch The convolution kernel size of the convolutional layer and the third 
and determine the fused feature by fusing the image features and the super-high-level image feature (Pg. 9, second paragraph: the two network blocks of Conv block and Identity block effectively ensure the depth of the neural network by merging the feature information of different levels on the input feature map; Pg. 7, fourth advantage of the present invention listed: the method of the invention utilizes the feature extraction capability of the feature extraction network framework and the pyramid pooling module, fully utilizes the high-level and low-level feature information, and considers the fusion feature information from multiple scales; Note: the fused feature includes the merged feature information of different levels (i.e. fusing the image features) and the high-level feature information (i.e. super-high-level image feature) from the pyramid module).
Regarding claim 11, the combination of Lou et al., Zheng et al., and Shuifa et al. teaches the limitations as explained in claim 9 above.
Zheng et al. further teaches, wherein the at least one processor is further configured to read and execute instructions stored in the memory to: fuse the image features in order of levels of the image features from highest to lowest (Para. 0194: the device 500 may include one or more of the following components: a processing component 502, a memory 504; Para. 0196; Para. 0027: the depth prediction model sequentially includes a basic model, a multi-scale model, a feature fusion layer, and a prediction output layer; Para. 0030: the feature fusion layer is used to restore the resolution of the current input image and reduce the number of channels, 
fuse the super-high-level image feature and the image feature with the highest level firstly (Para. 0146: the feature fusion layer is used to restore the resolution of the current input image and reduce the number of channels, and to fuse the features output by the basic model, so that the features from the bottom layer to the high layer can be considered; As seen in Fig. 3, the highest features are fused first (i.e. Conv2d(128- >64)) followed by the lowest features (i.e. Conv2d(32- > 16)); 
and fuse the image features based on a result of the previous said fusing the image features, and the image feature with a corresponding level (Para. 0147: an additive fusion method can be used, which can reduce the amount of calculation. In this process, the part enlarged from the small image is from the high-level features, and the corresponding layer before the addition is the low-level feature. Therefore, this method uses both the low-level feature and the high-level feature to ensure that useful information is not lost; As seen in Fig. 3, features extracted by the convolutional network layer in the base model can be input to the 
wherein times of said fusing the image features are smaller than or equal to a total number of the different levels, and a resolution of the fused feature is the same as a resolution of the image (Para. 0146: the feature fusion layer is used to restore the resolution of the current input image and reduce the number of channels, and to fuse the features output by the basic model, so that the features from the bottom layer to the high layer can be considered; Note: one of ordinary skill in the art can think of the resolution of the processing result of setting the last feature fusion operation prior to prediction is the same as the resolution of the image to be processed, and that more fused features are obtained by setting the number of fusion operations equal to the total number of levels of image features to improve accuracy).
Regarding claim 12, the combination of Lou et al., Zheng et al., and Shuifa et al. teaches the limitations as explained in claim 9 above.
Zheng et al. further teaches, wherein the at least one processor is further configured to read and execute instructions stored in the memory to: determine the depth distribution feature map with one channel based on the fused feature, wherein the fused feature comprises at least two channels (Para. 0194: the device 500 may include one or more of the following components: a processing component 502, a memory 504; Para. 0196; Para. 0147: since each time the Base Model process is convolved, the number of channels of the feature map will be thicker by 2 times, and the length and width of the channel will be reduced by half. Therefore, each time the Feature Fuse layer is convolved, the number of channels will be reduced. Reduced by half and doubled the length and width dimensions, but the process of reducing and 
Regarding claim 15, the combination of Lou et al., Zheng et al., and Shuifa et al. teaches the limitations as explained in claim 9 above.
Zheng et al. further teaches, wherein the at least one processor is further configured to read and execute instructions stored in the memory to: receive a second operation instruction for the blurred radius, wherein the second operation instruction comprises an aperture value (Para. 0194: the device 500 may include one or more of the following components: a processing component 502, a memory 504; Para. 0196; Para. 0138: for the pixels corresponding to multiple target blurred images, the interpolation weight of the target blurred image corresponding to the corresponding pixel with a higher degree of blur can be set to the first distance corresponding to the corresponding pixel and the largest positive integer less than the first distance. The interpolation weight of the target blurred image with a low degree of blurring corresponding to the corresponding pixel is the absolute value of the difference between the first distance corresponding to the corresponding pixel and the smallest positive integer greater than the first distance; Note: the larger the blurred radius (i.e. distance) is, the more blur there is; Para. 0154:according to the depth image, the user selects the target area to be focused to 
and determine the blurred radius based on the aperture value and the determined blurred radius (Para. 0157: blur represents the aperture value, the aperture range is 0~1, the larger the value, the larger the aperture value, the larger the aperture Larger, the shallower the depth of field, the more blurred the background; Note: the greater the blur radius, the greater the degree of blur, thus the combination of the two together controls the blur operation, both of which have a positive degree of blur, so that a new radius of blur can be obtained by multiplying the two).
Regarding claim 16, the combination of Lou et al., Zheng et al., and Shuifa et al. teaches the limitations as explained in claim 9 above.
Shuifa et al. further teaches, wherein convolution kernels are in at least two of the following shapes: heart shape, pentagram, circle, pentagon, and butterfly shape (Pg. 3: using various types of "kernels" to convolve with images to get the effect we want; Note: those skilled in the art will appreciate that different convolutional kernel types can refer to different shapes (i.e. circle, pentagon, etc.))
Regarding claim 17, Lou et al. teaches, a non-transitory computer storage medium, storing computer executable instructions, wherein the computer executable instructions are 
obtain an image by a monocular camera (Abstract: each original monocular image in the training set is used as the original input image; Note: a monocular camera is needed to obtain the monocular image);
extract image features with different levels based on the image (As seen in Pg. 3, fourth paragraph, there are multiple feature extraction network blocks and multiple convolution layers (i.e. different levels); Pg. 7, first advantage of the present invention listed: the four feature extraction network blocks in the feature extraction network framework are composed of two residual networks: Conv block and Identity block. Block composition, that is, the method of the present invention extracts features by using a combination of residual network blocks,…and utilizes the pooled blocks in the pyramid pooling module); 
determine a fused feature by fusing the image features with different levels (Pg. 7, fourth advantage of the present invention listed: the method of the invention utilizes the feature extraction capability of the feature extraction network framework and the pyramid pooling module, fully utilizes the high-level and low-level feature information, and considers the fusion feature information from multiple scales, thereby obtaining better prediction results and improving the depth estimation; Pg. 9, second paragraph: the two network blocks of Conv block and Identity block effectively ensure the depth of the neural network by margining the feature information of different levels on the input feature map; Pg. 9, third paragraph: the Add 
determine a depth distribution feature map of the image based on the fused feature, wherein a pixel value of each pixel point in the depth distribution feature map is a depth value (Pg. 3, fifth paragraph: for the feature extraction network framework, the input end of the first feature extraction network block receives all the feature maps in P1, and the output end of the first feature extraction network block outputs a K' amplitude feature map, and the set of K' amplitude feature maps is formed; Pg. 6, first paragraph: for the output layer, it comprises a convolution layer, wherein the convolution layer has a convolution kernel size of 3×3, the activation function adopts a linear rectification function; the input end of the output layer receives all the feature maps in C, and the output layer The output end outputs a predicted depth image corresponding to the original input image; Pg. 7, fourth advantage of the present invention listed: the method of the invention utilizes the feature extraction capability of the feature extraction network framework and the pyramid pooling module, fully utilizes the high-level and low-level feature information, and considers the fusion feature information from multiple scales, thereby obtaining better prediction results and improving the depth estimation; Note: the depth prediction (i.e. depth distribution) is determined after considering the fusion 
Lou et al. does not expressly disclose the following limitations: obtain a first depth value of a selected focusing point and a second depth value of a pixel point to be blurred of the depth distribution feature map; determine an absolute difference between the first depth value and the second depth value; determine a blurred radius based on the absolute difference; wherein the blurred radius is positively correlated with the absolute difference; determine a blurred kernel based on the blurred radius and a pre-selected convolution kernel; and blur respective pixel points in an area of the depth distribution feature map based on the blurred kernel, wherein the area is an area with the pixel point to be blurred as a reference and the blurred radius as a radius.
However, Zheng et al. teaches, obtain a first depth value of a selected focusing point and a second depth value of a pixel point to be blurred of the depth distribution feature map (Para. 0194: the device 500 may include one or more of the following components: a processing component 502, a memory 504; Para. 0196; Para. 0191: generate M blurred images with successively deepened blur degrees for the target image according to the depth image, where M is a natural number; for each pixel in the target image, according to the depth image obtains the pixel position depth of the pixel point and the first distance of the focus depth; according to As shown in Para. 0191, the focus depth value (i.e. first depth value) and the pixel position depth value of the pixel point (i.e. second depth value) are determined for each pixel in the target image to be blurred (i.e. creates a distribution of depth values). Para. 0131 further shows that the focus depth is selected (i.e. if the focus depth is 3m) as well as the pixel depths); 
determine an absolute difference between the first depth value and the second depth value (Para. 0191: according to the first distance, obtains the target blurred image corresponding to the pixel point from the M blurred images; according to the target The blurred image determines the blurred pixel value of the pixel point; Para. 0126: Gaussian blur may be used to generate M blurred images with successively deepened blur; Para. 0138: for example, for the above 6 blurred images, in the order of deepening of the blur program, they are P1, P2, P3, P4, P5, and P6. The pixel with the first distance of 1 meter corresponds to P1, and the first distance is 2 meters. The pixels of corresponds to P2, and the pixels of the first distance greater than 1 meter and less than 2 meters correspond to P1 and P2, and so on; Para. 0191 shows that the blurred pixel value of the pixel point is determined according to the blurred image and a difference between the pixel depth positions (i.e. absolute difference) is determined); 
determine a blurred radius based on the absolute difference (Para. 0138: for the pixels corresponding to multiple target blurred images, the interpolation weight of the target blurred image corresponding to the corresponding pixel with a higher degree of blur can be set to the As shown in Para. 0138, a first distance is determined when blurring the image. The first distance is used to determine the absolute difference (i.e. absolute value of the difference between the first distance…and the smallest positive integer greater than the first distance). Therefore, the first difference is a distance measurement (i.e. distance or radius of 1 meter) and not a distance difference); 
wherein the blurred radius is positively correlated with the absolute difference (Para. 0138: for the pixels corresponding to multiple target blurred images, the interpolation weight of the target blurred image corresponding to the corresponding pixel with a higher degree of blur can be set to the first distance corresponding to the corresponding pixel and the largest positive integer less than the first distance. The interpolation weight of the target blurred image with a low degree of blurring corresponding to the corresponding pixel is the absolute value of the difference between the first distance corresponding to the corresponding pixel and the smallest positive integer greater than the first distance; Note: the larger the blurred radius (i.e. distance) is, the more blur there is; As shown in Para. 0138 there is a positive relationship between the blurred radius and absolute difference in that there is more blur (i.e. higher degree of blur of the absolute difference) when the first distance is larger).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include depth values of a pixel to be 
The combination of Lou et al. and Zheng et al. does not expressly disclose the following limitations underlined above: determine a blurred kernel based on the blurred radius and a pre-selected convolution kernel; and blur respective pixel points in an area of the depth distribution feature map based on the blurred kernel, wherein the area is an area with the pixel point to be blurred as a reference and the blurred radius as a radius.
However, Shuifa et al. teaches, determine a blurred kernel based on the blurred radius and a pre-selected convolution kernel (Pg. 2, second paragraph: using various types of "kernels" to convolve with images to get the effect we want; Pg. 2-3: so if the Gaussian filter is convolved with the original image, the resulting image will become blurred, which is called Gaussian Blur in ImageJ. The specific operation of Gaussian filtering is to scan each pixel in the image with a template (or convolution, mask), and use the weighted average gray value of the pixels in the neighborhood determined by the template to replace the value of the center pixel of the template; Pg. 3: using Gaussian blur, users can set the standard deviation of the Gaussian function independently. The larger the standard deviation, the more serious the blur degree, as shown in Figure 11-2-4; Note: the standard deviation is the radius; As shown in Pgs. 2-3, different kernels are used to convolve the images in which a Gaussian blur is selected (i.e. pre-selected convolution kernel); On Pg. 5, the Gaussian blur has a radius that is set in the radius window. The standard deviation or sigma is further specified as the radius in FIG. 11-2-3); 
On Pgs. 2-3 that each pixel is blurred by the Gaussian filter in an area with a set standard deviation (i.e. area of distribution) and the standard deviation or sigma in FIG. 11-2-5 consists of specifying x and y values (i.e. radii) for the Gaussian blur).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include determining a blurred kernel based on the blurred radius as taught by Shuifa et al. into the combined image processing of Lou et al. and Zheng et al. in order to improve the blur degree (Shuifa et al., Pg. 3: the larger standard deviation, the more blur there is (and vice versa)).
Claims 6 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Lou et al. (CN 109410261 A, see attached machine translation) in view of Zheng et al. (CN 109889724 A, see attached machine translation) and further in view of “IMAGEJ Image Processing and Practice” by Shuifa et al. (see attached machine translation) and Kang et al. (CN 104102068 A, see attached machine translation).
Regarding claim 6, the combination of Lou et al., Zheng et al., and Shuifa et al. teaches the limitations as explained in claim 1 above.
Zheng et al. further teaches, wherein acquiring the selected focusing point comprises: obtaining image blocks by segmenting the image based on a size of a focusing frame (Para. 0153: When the user takes a photo, the aforementioned depth prediction model is used to obtain the depth image of the corresponding photo; Para. 0154: According to the depth image, the user selects the target area to be focused to form a corresponding large aperture map, as shown in Figures 4a and 4b, the focus point is different, and different large-aperture effects are formed; Para. 0157: focus and the orange frame represent the focus position, the focal length range is 0~1, the value will change when the position changes);
receiving a first operation instruction of the focusing frame area, wherein the first operation instruction comprises values of the focusing frame area (Para. 0154: According to the depth image, the user selects the target area to be focused to form a corresponding large aperture map, as shown in Figures 4a and 4b, the focus point is different, and different large-aperture effects are formed; Para. 0157: focus and the orange frame represent the focus position, the focal length range is 0~1, the value will change when the position changes; Note: the values of the focusing frame area is the focal length range);
and selecting a specified image block as the focusing frame area based on the first 
The combination does not expressly disclose the following limitations: determining depth statistical values of pixel points of each image block; determining a first value range of a focusing frame area based on the depth statistical values; wherein the depth statistical values in the specified image block are equal to the values of the focusing frame area in the first operation instruction, and the focusing frame area represents a position of the selected focusing point.
However, Kang et al. teaches, determining depth statistical values of pixel points of each image block (Para. 0013: the step of judging the depth information corresponding to the target object based on the optimized three-dimensional depth image and obtaining the focus position includes: selecting a block that includes the target object, and reading multiple neighbors in the block The depth information of neighborhood pixels is statistically calculated on the depth information of these neighborhood pixels to obtain the optimized depth information of the target object; Para. 0056: the method of performing this statistical operation may be average operation (mean), mode operation (mod), median operation (median), minimum operation (minimum), quartile (quartile) or other suitable methods. Mathematical and statistical operations. In more detail, the average operation refers to the average depth information of this block as the optimized depth information for the subsequent auto-focusing steps); 

wherein the depth statistical values in the specified image block are equal to the values of the focusing frame area in the first operation instruction, and the focusing frame area represents a position of the selected focusing point (Para. 0055: Step S150 shown in Fig. 4 judges the depth information corresponding to the target object according to the optimized three-dimensional depth map, and obtains the focus position of the target object according to the depth information…The depth information performs statistical calculations to obtain the optimized depth information of the target object; Note: the depth statistic information of each 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include determining depth statistical values of the points as taught by Kang et al. into the combined image processing of Lou et al., Zheng et al. and Shuifa et al. in order to more reliably calculate the effective depth information of the target, so as to avoid the possibility of focusing on an incorrect target (Kang et al. Para. 0055).
Regarding claim 14, the combination of Lou et al., Zheng et al., and Shuifa et al. teaches the limitations as explained in claim 9 above.
Zheng et al. further teaches, wherein the at least one processor is further configured to read and execute instructions stored in the memory to: obtain image blocks by segmenting the image based on a size of a focusing frame (Para. 0194: the device 500 may include one or more of the following components: a processing component 502, a memory 504; Para. 0153: When the user takes a photo, the aforementioned depth prediction model is used to obtain the depth image of the corresponding photo; Para. 0154: According to the depth image, the user selects the target area to be focused to form a corresponding large aperture map, as shown in Figures 4a and 4b, the focus point is different, and different large-aperture effects are formed; Para. 0157: focus and the orange frame represent the focus position, the focal length range is 0~1, the value will change when the position changes);

and select a specified image block as the focusing frame area based on the first operation instruction (Para. 0154: According to the depth image, the user selects the target area to be focused to form a corresponding large aperture map, as shown in Figures 4a and 4b, the focus point is different, and different large-aperture effects are formed; Para. 0157: focus and the orange frame represent the focus position, the focal length range is 0~1, the value will change when the position changes; Note: the user selects a target area (i.e. image block)).
The combination of Lou et al., Zheng et al. and Shuifa et al. does not expressly disclose the following limitations: determine depth statistical values of pixel points of each image block; determine a first value range of a focusing frame area based on the depth statistical values; wherein the depth statistical values in the specified image block are equal to the values of the focusing frame area in the first operation instruction, and the focusing frame area represents a position of the selected focusing point.
However, Kang et al. teaches, determine depth statistical values of pixel points of each image block (Para. 0013: the step of judging the depth information corresponding to the target object based on the optimized three-dimensional depth image and obtaining the focus position 
determine a first value range of a focusing frame area based on the depth statistical values (Para. 0009: the target is selected, and the first and second image sensors are used to photograph the target to generate the first image and the second image. And perform a three-dimensional depth estimation based on the first image and the second image to generate a three-dimensional depth map. And optimize the three-dimensional depth map to generate an optimized three-dimensional depth map. It also judges the depth information corresponding to the target object based on the optimized three-dimensional depth map, and obtains the focus position of the target object based on the depth information. Then, the auto-focus device is driven to execute the auto-focus procedure according to the focus position; Para. 0056: the method of performing this statistical operation may be average operation (mean), mode operation (mod), median operation (median), minimum operation (minimum), quartile (quartile) or other suitable methods. Mathematical and statistical operations. In more detail, the average operation refers to the average depth information of this block as the optimized depth information for the subsequent auto-focusing steps; Note: the range of the value of the 
wherein the depth statistical values in the specified image block are equal to the values of the focusing frame area in the first operation instruction, and the focusing frame area represents a position of the selected focusing point (Para. 0055: Step S150 shown in Fig. 4 judges the depth information corresponding to the target object according to the optimized three-dimensional depth map, and obtains the focus position of the target object according to the depth information…The depth information performs statistical calculations to obtain the optimized depth information of the target object; Note: the depth statistic information of each image block is determined, and then the value of the focus frame is controlled by Focus, and the corresponding image block is selected as the focus frame region after matching with the depth statistic information of the image block. In addition, the range of the value of the focus frame region coincides with the range of the depth statistic value of the image block).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to include determining depth statistical values of the points as taught by Kang et al. into the combined image processing of Lou et al., Zheng et al. and Shuifa et al. in order to more reliably calculate the effective depth information of the target, so as to avoid the possibility of focusing on an incorrect target (Kang et al. Para. 0055).
Response to Arguments
Applicant's arguments filed on 9/7/2021 have been fully considered but they are not persuasive.
Applicant argues, on pages 9-14 of the remarks, argues Zheng et al. and Shuifa et al. in 
The Examiner respectfully disagrees. Zheng et al. teaches in Para. 0191 that the focus depth value (i.e. first depth value) and the pixel position depth value of the pixel point (i.e. second depth value) are determined for each pixel in the target image to be blurred (i.e. creates a distribution of depth values). Para. 0131 further shows that the focus depth is selected (i.e. if the focus depth is 3m) as well as the pixel depths. Para. 0191 also shows that the blurred pixel value of the pixel point is determined according to the blurred image and a difference between the pixel depth positions (i.e. absolute difference) is determined. Zheng et al. teaches in Para. 0138 that a first distance is determined when blurring the image. The first distance is used to determine the absolute difference (i.e. absolute value of the difference between the first distance…and the smallest positive integer greater than the first distance). Therefore, the first difference is a distance measurement (i.e. distance or radius of 1 meter) and not a distance difference. Additionally, Para. 0138 teaches a positive relationship between the blurred radius and absolute difference in that there is more blur (i.e. higher degree of blur of the absolute 
Shuifa et al. teaches on Pgs. 2-3 that different kernels are used to convolve the images in which a Gaussian blur is selected (i.e. pre-selected convolution kernel). On Pg. 5, Shuifa et al. discloses that the Gaussian blur has a radius that is set in the radius window. The standard deviation or sigma is further specified as the radius in FIG. 11-2-3. Additionally, Shuifa et al. teaches on Pgs. 2-3 that each pixel is blurred by the Gaussian filter in an area with a set standard deviation (i.e. area of distribution) and the standard deviation or sigma in FIG. 11-2-5 consists of specifying x and y values (i.e. radii) for the Gaussian blur.
All remaining arguments are reliant on the aforementioned and addressed arguments and thus are considered to be wholly addressed herein. The office action has also been updated to address the applicant's argument. See the updated review comments above (in bold) for details.
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final 
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Daniella M. DiGuglielmo whose telephone number is 571-272-2682.  The examiner can normally be reached on Monday - Friday 7:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on 571-272-7882.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.







/PING Y HSIEH/Primary Examiner, Art Unit 2664