DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The claim amendment filed on 08/26/2022 was entered with pending Claims 1-20.
Response to Arguments/Remarks
Response to the 35 U.S.C. §§ 102, 103 rejections (Remarks pages 9-12) with respect to Claims 1-20 have been fully considered along with the claim amendments. The submitted amendments to the claims are considered significant to modify the interpretation of the claim limitations. Examiner conducted an updated search and identified prior art that teaches the amended claim limitations, which are incorporated into the updated cited prior art.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 2, 15 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Both claims 2, 15 are amended to include the limitation "during inference." The 08/26/2022 Remarks did not provide sufficient support or citation to support in the Specification. Examiner search of the Specification did not result in identifying support in the Specification. Ordinary meaning of the term "inference" can take multiple interpretive definitions as further described below. The claim amendment "during inference" constitutes new subject matter and is therefore not patentable in this application. 

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2, 15 recites the limitation "during inference" in each amended claim.  There is insufficient antecedent basis for this limitation in the claim. Both claims 2, 15 are amended to include the limitation "during inference." The 08/26/2022 Remarks did not provide sufficient support or citation to the Specification to support this limitation. Examiner search of the Specification did not result in identifying support in the Specification. Ordinary meaning of the term "during inference" can take multiple interpretive meanings such as inference to time, location, data acquisition or data analysis. The limitation is unclear in its meaning or interpretation. The claim amendment "during inference" constitutes insufficient antecedent basis and is therefore not patentable in this application. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 6-10, 14-15, 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kwon et al (US PG PUB 2020/0218979) in view of Urtasun et al (US 2020/0160559).
Regarding Claim 1, Kwon et al teach a depth system (system on chip 2104; Fig 21C and ¶ [0201]), comprising: one or more processors (CPU 2106, GPU 2108, processors 2110; Fig 21C and ¶ [0201]); a memory communicably coupled to the one or more processors (memory is part of the SoC 2104 to unify memory of the CPU 2106 and GPU 2108; Fig 21C and ¶ [0197], [0210]) and storing: a network module including instructions that, when executed by the one or more processors, cause the one or more processors (data store 2116 of SoC 2104 stores neural networks including instructions for object detection, executed on processor; Figs 14, 18, 21C and ¶ [0160], [0210]-[0211], [0225]-[0226]) to: generate depth features from sensor data according to whether the sensor data includes sparse depth data (sensor data 102 is generated from sensors of vehicle 2100 including predicting data from sparse data (via sampling); Figs 14, 18, 21C and ¶ [0146], [0156], [0160]-[0161], [0210]-[0211]), generate a depth map from at least a monocular image using the depth model that is guided by the depth features when injected (a front-facing monocular camera of vehicle 2100 is used for image generation in sensor data 102 for predicting depth information using the machine learning model 104; Figs 14, 18, 21C and ¶ [0146], [0156], [0160]-[0162], [0210]-[0211]) and provide the depth map as depth estimates of objects represented in the monocular image (predicted depth maps of objects are generated that correspond to the input images at the same spatial resolution; Figs 14, 18, 21C and ¶ [0146], [0156], [0160]-[0162], [0225]).  
	Kwon et al does not teach selectively inject the depth features into a depth model when the depth features are available according to a presence of the sparse depth data in the sensor data.
Urtasun et al is analogous art pertinent to the technological problem addressed in this application and teaches selectively inject the depth features into a depth model when the depth features are available according to a presence of the sparse depth data in the sensor data (a depth completion model 214 shares depth features with the image backbone network 204 and applies convolution layers and up-sampling layers to predict a dense pixel-wise depth 222 image; Fig 2 and ¶ [0087]-[0091]). 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of this application, to combine the teachings of Kwon et al with Urtasun et al including selectively inject the depth features into a depth model when the depth features are available according to a presence of the sparse depth data in the sensor data. Use of a machine-learned model that can multi-task both images and sensor data and interpolate data from sparse to detect depth overcomes the challenge of detecting objects that are occluded or far away, as recognized by Urtasun et al (¶ [0021]).
Regarding Claim 2, Kwon et al in view of Urtasun et al teach the depth system of claim 1 (as described above), wherein Kwon et al teach the network module includes instructions to generate the depth features (data store 2116 of SoC 2104 stores neural networks including instructions for object detection (including depth features), executed on processor; Figs 18, 21C and ¶ [0160]-[0161], [0197], [0225]-[0226]) including instructions to use a sparse auxillary network that is a convolutional encoder to generate the depth features from the sparse depth data during inference (the free-space data 1402 generated using a neural network that is used to predict (via sampling) and generate depth map (features) from sparse data can also undergo ground truth encoding 110; Figs 14, 18, 21C and ¶ [0144], [0146], [0149], [0156]), and wherein the sparse depth data is part of the sensor data that is acquired from a range sensor (sparse Lidar data is part of the sensor data 102 used for generating the predicted depth maps using the machine learning model 104 and is acquired from a Lidar sensor 2160; Figs 14, 18, 21C and ¶ [0156], [0225]).  
Regarding Claim 6, Kwon et al in view of Urtasun et al teach the depth system of claim 1 (as described above), wherein Kwon et al teach the network module includes instructions (the Reduced Instruction Set Computer (RISC) of accelerators 2114 in SoC 2104 stores instructions for image sensors; Fig 21C and ¶ [0214]-[0215]) to acquire the sensor data including at least the monocular image from at least one sensor of a device (a front-facing monocular camera 2170 of vehicle 2100 is used for image generation in sensor data 102; Figs 21B,21C and ¶ [0146], [0192]-[0193]), wherein the network module includes instructions to generate the depth features (data store 2116 of SoC 2104 stores neural networks including instructions for object detection (including depth features), executed on processor; Figs 18, 21C and ¶ [0160]-[0161], [0197], [0225]-[0226]) including instructions to determine whether the sensor data includes sparse depth data in addition to the monocular image (free space data 1402 and distances 1408 are determined to identify if data is sparse from the machine learning model 104; Figs 14, 17 and ¶ [0146], [0156]-[0157]), and activating a sparse auxillary network to generate the depth features from the sparse depth data when the sparse depth data is present (after the machine learning model 104 identifies an object 116 (depth feature generation) the predicted depth map then undergoes sampling 1406 to better predict object features by increasing resolution through increasing free-space distance 1408 points; Figs 14, 17 and ¶ [0156]-[0157]).
Regarding Claim 7, Kwon et al in view of Urtasun et al teach the depth system of claim 1 (as described above), wherein Kwon et al teach providing the depth map includes controlling a device to navigate through a surrounding environment according to the depth map that identifies distances to objects in the surrounding environment (the vehicle 2100 includes controllers 2136 that may include the SoC 2104 to operate the vehicle in response to sensor data that includes identification of distances to objects in the environment, with the SoC depth map data incorporated into controlling the vehicle through the network interface 2124; Fig 21C and ¶ [0184]-[0185], [0201]).  
Regarding Claim 8, Kwon et al in view of Urtasun et al teach the depth system of claim 1 (as described above), wherein Kwon et al teach the depth system is integrated within a device for autonomously controlling a vehicle (the vehicle 2100 includes the SoC 2104 for depth perception and use for autonomous driving; Fig 21C and [0201], [0214], [0222]).  

Regarding Claim 9, Kwon et al teach a non-transitory computer-readable medium including instructions that when executed by one or more processors cause the one or more processors (data store 2116 of SoC 2104 is understood as a non-transitory computer-readable medium that stores neural networks including instructions for object detection, executed on processor; Fig 21C and ¶ [0197], [0225]-[0226]) to: generate depth features from sensor data according to whether the sensor data includes sparse depth data (sensor data 102 is generated from sensors of vehicle 2100 including predicting data from sparse data (via sampling); Figs 14, 18, 21C and ¶ [0146], [0156], [0160]-[0161], [0210]-[0211]), generate a depth map from at least a monocular image using the depth model that is guided by the depth features when injected (a front-facing monocular camera of vehicle 2100 is used for image generation in sensor data 102 for predicting depth information using the machine learning model 104; Figs 14, 18, 21C and ¶ [0146], [0156], [0160]-[0162], [0210]-[0211]) and provide the depth map as depth estimates of objects represented in the monocular image (predicted depth maps of objects are generated that correspond to the input images at the same spatial resolution; Figs 14, 18, 21C and ¶ [0146], [0156], [0160]-[0162], [0225]).  

Kwon et al does not teach selectively inject the depth features into a depth model when the depth features are available according to a presence of the sparse depth data in the sensor data.
Urtasun et al is analogous art pertinent to the technological problem addressed in this application and teaches selectively inject the depth features into a depth model when the depth features are available according to a presence of the sparse depth data in the sensor data (a depth completion model 214 shares depth features with the image backbone network 204 and applies convolution layers and up-sampling layers to predict a dense pixel-wise depth 222 image; Fig 2 and ¶ [0087]-[0091]). 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of this application, to combine the teachings of Kwon et al with Urtasun et al including selectively inject the depth features into a depth model when the depth features are available according to a presence of the sparse depth data in the sensor data. Use of a machine-learned model that can multi-task both images and sensor data and interpolate data from sparse to detect depth overcomes the challenge of detecting objects that are occluded or far away, as recognized by Urtasun et al (¶ [0021]).
Regarding Claim 10, Kwon et al in view of Urtasun et al teach the non-transitory computer-readable medium of claim 9 (as described above), wherein Kwon et al teach the instructions to generate the depth features include instructions (data store 2116 of SoC 2104 stores neural networks including instructions for object detection, executed on processor; Fig 21C and ¶ [0197], [0225]-[0226]) to generate the depth features including instructions to use a sparse auxillary network that is a convolutional encoder to generate the depth features from the sparse depth data (the free-space data 1402 generated using a neural network that is used to predict (via sampling) and generate depth map (features) from sparse data can also undergo ground truth encoding 110; Figs 14, 18, 21C and ¶ [0144], [0146], [0149], [0156]), and wherein the sparse depth data is part of the sensor data that is acquired from a range sensor (spare Lidar data is part of the sensor data 102 used for generating the predicted depth maps using the machine learning model 104 and is acquired from a Lidar sensor 2160; Figs 14, 18, 21C and ¶ [0156], [0225]).  

Claim 14, Kwon et al teach a method (process 1400 of using system on chip 2104 for depth detection of objects; Fig 14, 18 and ¶ [0143], [0160]-[0161]), comprising: generating depth features from sensor data according to whether the sensor data includes sparse depth data (sensor data 102 is generated from sensors of vehicle 2100 including predicting data from sparse data (via sampling); Figs 14, 18, 21C and ¶ [0146], [0156], [0160]-[0161], [0210]-[0211]); 29generating a depth map from at least a monocular image using the depth model that is guided by the depth features when injected (a front-facing monocular camera of vehicle 2100 is used for image generation in sensor data 102 for predicting depth information using the machine learning model 104; Figs 14, 18, 21C and ¶ [0146], [0156], [0160]-[0162], [0210]-[0211]); and providing the depth map as depth estimates of objects represented in the monocular image  (predicted depth maps of objects are generated that correspond to the input images at the same spatial resolution; Figs 14, 18, 21C and ¶ [0146], [0156], [0160]-[0162], [0225]).  
Kwon et al does not teach selectively inject the depth features into a depth model when the depth features are available according to a presence of the sparse depth data in the sensor data.
Urtasun et al is analogous art pertinent to the technological problem addressed in this application and teaches selectively inject the depth features into a depth model when the depth features are available according to a presence of the sparse depth data in the sensor data (a depth completion model 214 shares depth features with the image backbone network 204 and applies convolution layers and up-sampling layers to predict a dense pixel-wise depth 222 image; Fig 2 and ¶ [0087]-[0091]). 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of this application, to combine the teachings of Kwon et al with Urtasun et al including selectively inject the depth features into a depth model when the depth features are available according to a presence of the sparse depth data in the sensor data. Use of a machine-learned model that can multi-task both images and sensor data and interpolate data from sparse to detect depth overcomes the challenge of detecting objects that are occluded or far away, as recognized by Urtasun et al (¶ [0021]).
Regarding Claim 15, Kwon et al in view of Urtasun et al teach the method of claim 14 (as described above), wherein Kwon et al teach generating the depth features includes using a sparse auxillary network that is a convolutional encoder to generate the depth features from the sparse depth data during inference (the free-space data 1402 generated using a neural network that is used to predict (via sampling) and generate depth map (features) from sparse data can also undergo ground truth encoding 110; Figs 14, 18, 21C and ¶ [0144], [0146], [0149], [0156]), and wherein the sparse depth data is part of the sensor data that is acquired from a range sensor (sparse Lidar data is part of the sensor data 102 used for generating the predicted depth maps using the machine learning model 104 and is acquired from a Lidar sensor 2160; Figs 14, 18, 21C and ¶ [0156], [0225]).  
Regarding Claim 19, Kwon et al in view of Urtasun et al teach the method of claim 14 (as described above), Kwon et al further comprising: acquiring the sensor data including at least the monocular image from at least one sensor of a device (a front-facing monocular camera 2170 of vehicle 2100 is used for image generation in sensor data 102; Figs 21B,21C and ¶ [0146], [0192]-[0193]), wherein generating the depth features includes determining whether the sensor data includes sparse depth data in addition to the monocular image (free space data 1402 and distances 1408 are determined to identify if data is sparse from the machine learning model 104; Figs 14, 17 and ¶ [0146], [0156]-[0157]), and activating a sparse auxillary network to generate the depth features from the sparse depth data when the sparse depth data is present (after the machine learning model 104 identifies an object 116 (depth feature generation) the predicted depth map then undergoes sampling 1406 to better predict object features by increasing resolution through increasing free-space distance 1408 points; Figs 14, 17 and ¶ [0156]-[0157]).
Regarding Claim 20, Kwon et al in view of Urtasun et al teach the method of claim 14 (as described above), wherein Kwon et al teaches providing the depth map includes controlling a device to navigate through a surrounding environment according to the depth map that identifies distances to objects in the surrounding environment (the vehicle 2100 includes controllers 2136 that may include the SoC 2104 to operate the vehicle in response to sensor data that includes identification of distances to objects in the environment, with the SoC depth map data incorporated into controlling the vehicle through the network interface 2124; Fig 21C and ¶ [0184]-[0185], [0201]).


Claims 3-4, 11-12, 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Kwon et al (US PG PUB 2020/0218979) in view of Urtasun et al (US 2020/0160559) and Popov et al (CN 112825134).
Regarding Claim 3, Kwon et al in view of Urtasun et al teach the depth system of claim 1 (as described above), wherein Urtasun et al teach the network module includes instructions to selectively inject the depth features (a depth completion model 214 shares depth features with the image backbone network 204 and applies convolution layers and up-sampling layers to predict a dense pixel-wise depth 222 image; Fig 2 and ¶ [0087]-[0091]) including instructions to, in response to determining that the sensor data includes the sparse depth data (sparse depth image 210 is generated with sub-pixel or no pixel identified, especially at long range; ¶ [0086]-[0088]), inject the depth features into the depth model byU.S. Patent Appln. Serial No. 17/176,336Page 3 of 13Response to the Non-Final Office Action mailed June 22, 2022Dated: August 26, 2022 concatenating the depth features with image features (the sparse depth image 210 can be concatenated with the RGB image 208 and fed to the image backbone model 204; Fig 2 and ¶ [0088]) from an encoder of the depth model (communication over the network 840 between the computing systems 810 and machine learning system 850 can be accomplished with encoding; ¶ [0139]-[0140]).  
Kwon et al in view of Urtasun et al does not teach to provide concatenated features into a decoder of the depth model.
Popov et al is analogous art pertinent to the problem solved in this application including to provide concatenated features into a decoder of the depth model (the machine learning model 108 includes using an encoder and decoder for the depth feature extractor 310 of the Radar data 106 and upsampling the feature map to improve the resolution (injecting depth features); Figs 1, 3 and ¶ [0050]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine the teachings of Kwon et al and Urtasun et al with Popov et al including to provide concatenated features into a decoder of the depth model. Use of an encoder and decoder allows for the contraction and expansion of data during the convolutional neural network processing with a skip connection allowing for improving the feature map resolution, as recognized by Popov et al (¶ [0050]).
Regarding Claim 4, Kwon et al in view of Urtasun et al and Popov et al teach the depth system of claim 3 (as described above), wherein Popov et al teaches the network module includes instructions (the SoC 1304 uses the accelerator 1314 for executing the CNN image processing for object detection; Fig 13C and [0158]-[0159]) to inject the depth features including instructions to apply learned weights to the depth features and the image features (the neural network can use weights for the detected objects; Fig 13C and ¶ [0149]-[0150], [0164]) prior to concatenating via skip connections of the depth model (a skip connection 312 is used in the depth feature extractor 310 of the machine learning model 108; Figs 1, 3 and ¶ [0050]).  

Regarding Claim 11, Kwon et al in view of Urtasun et al teach the non-transitory computer-readable medium of claim 9 (as described above), wherein Urtasun et al teach instructions to selectively inject the depth features (a depth completion model 214 shares depth features with the image backbone network 204 and applies convolution layers and up-sampling layers to predict a dense pixel-wise depth 222 image; Fig 2 and ¶ [0087]-[0091]) include instructions to, in response to determining that the sensor data includes the sparse depth data (sparse depth image 210 is generated with sub-pixel or no pixel identified, especially at long range; ¶ [0086]-[0088]), inject the depth features into the depth model byU.S. Patent Appln. Serial No. 17/176,336Page 3 of 13Response to the Non-Final Office Action mailed June 22, 2022Dated: August 26, 2022 concatenating the depth features with image features (the sparse depth image 210 can be concatenated with the RGB image 208 and fed to the image backbone model 204; Fig 2 and ¶ [0088]) from an encoder of the depth model (communication over the network 840 between the computing systems 810 and machine learning system 850 can be accomplished with encoding; ¶ [0139]-[0140]).  
Kwon et al in view of Urtasun et al does not teach to provide concatenated features into a decoder of the depth model.
Popov et al is analogous art pertinent to the problem solved in this application including to provide concatenated features into a decoder of the depth model (the machine learning model 108 includes using an encoder and decoder for the depth feature extractor 310 of the Radar data 106 and upsampling the feature map to improve the resolution (injecting depth features); Figs 1, 3 and ¶ [0050]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine the teachings of Kwon et al and Urtasun et al with Popov et al including to provide concatenated features into a decoder of the depth model. Use of an encoder and decoder allows for the contraction and expansion of data during the convolutional neural network processing with a skip connection allowing for improving the feature map resolution, as recognized by Popov et al (¶ [0050]).
Regarding Claim 12, Kwon et al in view of Urtasun et al and Popov et al teach the non-transitory computer-readable medium of claim 11 (as described above), wherein Popov et al teaches the instructions (the SoC 1304 uses the accelerator 1314 for executing the CNN image processing for object detection; Fig 13C and [0158]-[0159]) to inject the depth features including instructions to apply learned weights to the depth features and the image features (the neural network can use weights for the detected objects; Fig 13C and ¶ [0149]-[0150], [0164]) prior to concatenating via skip connections of the depth model (a skip connection 312 is used in the depth feature extractor 310 of the machine learning model 108; Figs 1, 3 and ¶ [0050]).  

Regarding Claim 16, Kwon et al in view of Urtasun et al teach the method of claim 14 (as described above), wherein Urtasun et al teach selectively injecting the depth features (a depth completion model 214 shares depth features with the image backbone network 204 and applies convolution layers and up-sampling layers to predict a dense pixel-wise depth 222 image; Fig 2 and ¶ [0087]-[0091]) include instructions to, in response to determining that the sensor data includes the sparse depth data (sparse depth image 210 is generated with sub-pixel or no pixel identified, especially at long range; ¶ [0086]-[0088]), injecting the depth features into the depth model byU.S. Patent Appln. Serial No. 17/176,336Page 3 of 13Response to the Non-Final Office Action mailed June 22, 2022Dated: August 26, 2022 concatenating the depth features with image features (the sparse depth image 210 can be concatenated with the RGB image 208 and fed to the image backbone model 204; Fig 2 and ¶ [0088]) from an encoder of the depth model (communication over the network 840 between the computing systems 810 and machine learning system 850 can be accomplished with encoding; ¶ [0139]-[0140]).  
Kwon et al in view of Urtasun et al does not teach to provide concatenated features into a decoder of the depth model.
Popov et al is analogous art pertinent to the problem solved in this application including to provide concatenated features into a decoder of the depth model (the machine learning model 108 includes using an encoder and decoder for the depth feature extractor 310 of the Radar data 106 and upsampling the feature map to improve the resolution (injecting depth features); Figs 1, 3 and ¶ [0050]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine the teachings of Kwon et al and Urtasun et al with Popov et al including to provide concatenated features into a decoder of the depth model. Use of an encoder and decoder allows for the contraction and expansion of data during the convolutional neural network processing with a skip connection allowing for improving the feature map resolution, as recognized by Popov et al (¶ [0050]).
Regarding Claim 17, Kwon et al in view of Urtasun et al and Popov et al teach the method of claim 16 (as described above), wherein Popov et al teaches injecting the depth features includes applying learned weights to the depth features and the image features (the neural network can use weights for the detected objects; Fig 13C and ¶ [0149]-[0150], [0164]) prior to concatenating via skip connections of the depth model (a skip connection 312 is used in the depth feature extractor 310 of the machine learning model 108; Figs 1, 3 and ¶ [0050]).  

Claims 5, 13, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Kwon et al (US PG PUB 2020/0218979) in view of Urtasun et al (US 2020/0160559) and Yang et al (US PG PUB 2019/0387209).
Regarding Claim 5, Kwon et al in view of Urtasun et al teach the depth system of claim 1 (as described above), wherein Urtasun et al teaches the network module includes instructions to generate the depth map including instructions to apply the depth model to the monocular image (the sparse depth image 210 can be concatenated with the RGB image 208 and fed to the image backbone model 204; Fig 2 and ¶ [0088]) by using an encoder of the depth model to encode image features (communication over the network 840 between the computing systems 810 and machine learning system 850 can be accomplished with encoding; ¶ [0139]-[0140]).  
	Kwon et al in view of Urtasun et al does not teach to use a decoder of the depth model to decode the depth features into the depth map, and wherein the network module includes instructions to decode the image features at separate spatial resolutions as provided by skip connections between the encoder and the decoder in combination with an output of a previous layer of the decoder.
	Yang et al is analogous art pertinent to the problem solved in this application including to apply the depth model to the monocular image by using an encoder of the depth model to encode image features (a monocular camera 105 is used to produce a single image that is input in the encoder of the encoder-decoder 202 of a convolution neural network for additive residual signals for the depth map 217; Figs 1, 2 and ¶ [0024], [0031]-[0033]], and to use a decoder of the depth model to decode the depth features into the depth map (the decoder of the encoder-decoder 202 of the convolution neural network is used to upproject feature maps; ¶ [0033]), and wherein the network module includes instructions to decode the image features at separate spatial resolutions as provided by skip connections between the encoder and the decoder in combination with an output of a previous layer of the decoder (the decoder architecture 202 will use skip-connections to enable high-resolution results of the reconstruction and the resolution for each layer is performed such that the resolutions are separate for each layer; ¶ [0033]-[0035]).  
	It would have been obvious to one of ordinary skill in the art to combine the teachings of Kwon et al and Urtasun et al with Yang et al including apply the depth model to the monocular image by using an encoder of the depth model to encode image features and to use a decoder of the depth model to decode the depth features into the depth map, and wherein the network module includes instructions to decode the image features at separate spatial resolutions as provided by skip connections between the encoder and the decoder in combination with an output of a previous layer of the decoder.  By using an encoder-decoder with a skip-connections between the encoder and decoder for each layer of the convolution layers the decoder can recover high-resolution results with fine-grained details for each layer thereby improving all resolutions within the depth image, as recognized by Yang et al (¶ [0033]).

Regarding Claim 13, Kwon et al in view of Urtasun et al teach the non-transitory computer-readable of claim 9 (as described above), wherein Urtasun et al teach instructions to generate the depth map include instructions to apply the depth model to the monocular image (the sparse depth image 210 can be concatenated with the RGB image 208 and fed to the image backbone model 204; Fig 2 and ¶ [0088]) by using an encoder of the depth model to encode image features (communication over the network 840 between the computing systems 810 and machine learning system 850 can be accomplished with encoding; ¶ [0139]-[0140]).  
	Kwon et al in view of Urtasun et al does not teach to use a decoder of the depth model to decode the depth features into the depth map, and wherein the instructions to decode the image features at separate spatial resolutions as provided by skip connections between the encoder and the decoder in combination with an output of a previous layer of the decoder.  
	Yang et al is analogous art pertinent to the problem solved in this application including to apply the depth model to the monocular image by using an encoder of the depth model to encode image features (a monocular camera 105 is used to produce a single image that is input in the encoder of the encoder-decoder 202 of a convolution neural network for additive residual signals for the depth map 217; Figs 1, 2 and ¶ [0024], [0031]-[0033]], and to use a decoder of the depth model to decode the depth features into the depth map (the decoder of the encoder-decoder 202 of the convolution neural network is used to upproject feature maps; ¶ [0033]), and wherein the instructions to decode the image features at separate spatial resolutions as provided by skip connections between the encoder and the decoder in combination with an output of a previous layer of the decoder (the decoder architecture 202 will use skip-connections to enable high-resolution results of the reconstruction and the resolution for each layer is performed such that the resolutions are separate for each layer; ¶ [0033]-[0035]).  
	It would have been obvious to one of ordinary skill in the art to combine the teachings of Kwon et al and Urtasun et al with Yang et al including apply the depth model to the monocular image by using an encoder of the depth model to encode image features and to use a decoder of the depth model to decode the depth features into the depth map, and wherein the instructions to decode the image features at separate spatial resolutions as provided by skip connections between the encoder and the decoder in combination with an output of a previous layer of the decoder.  By using an encoder-decoder with a skip-connections between the encoder and decoder for each layer of the convolution layers the decoder can recover high-resolution results with fine-grained details for each layer thereby improving all resolutions within the depth image, as recognized by Yang et al (¶ [0033]).

Regarding Claim 18, Kwon et al in view of Urtasun et al teach the method of claim 14 (as described above), wherein Urtasun et al teach generating the depth map includes applying the depth model to the monocular image (the sparse depth image 210 can be concatenated with the RGB image 208 and fed to the image backbone model 204; Fig 2 and ¶ [0088]) by using an encoder of the depth model to encode imageU.S. Patent Appln. Serial No. 17/176,336Page 6 of 13 Response to the Non-Final Office Action mailed June 22, 2022Dated: August 26, 2022features (communication over the network 840 between the computing systems 810 and machine learning system 850 can be accomplished with encoding; ¶ [0139]-[0140]).  
generate the depth map include instructions to apply the depth model to the monocular image by using an encoder of the depth model to encode image features 

	Kwon et al in view of Urtasun et al does not teach to use a decoder of the depth model to decode the depth features into the depth map, and wherein the network module includes instructions to decode the image features at separate spatial resolutions as provided by skip connections between the encoder and the decoder in combination with an output of a previous layer of the decoder.  
	Yang et al is analogous art pertinent to the problem solved in this application including to apply the depth model to the monocular image by using an encoder of the depth model to encode image features (a monocular camera 105 is used to produce a single image that is input in the encoder of the encoder-decoder 202 of a convolution neural network for additive residual signals for the depth map 217; Figs 1, 2 and ¶ [0024], [0031]-[0033]], and to use a decoder of the depth model to decode the depth features into the depth map (the decoder of the encoder-decoder 202 of the convolution neural network is used to upproject feature maps; ¶ [0033]), and wherein the network module includes instructions to decode the image features at separate spatial resolutions as provided by skip connections between the encoder and the decoder in combination with an output of a previous layer of the decoder (the decoder architecture 202 will use skip-connections to enable high-resolution results of the reconstruction and the resolution for each layer is performed such that the resolutions are separate for each layer; ¶ [0033]-[0035]).  
	It would have been obvious to one of ordinary skill in the art to combine the teachings of Kwon et al and Urtasun et al with Yang et al including apply the depth model to the monocular image by using an encoder of the depth model to encode image features and to use a decoder of the depth model to decode the depth features into the depth map, and wherein the network module includes instructions to decode the image features at separate spatial resolutions as provided by skip connections between the encoder and the decoder in combination with an output of a previous layer of the decoder.  By using an encoder-decoder with a skip-connections between the encoder and decoder for each layer of the convolution layers the decoder can recover high-resolution results with fine-grained details for each layer thereby improving all resolutions within the depth image, as recognized by Yang et al (¶ [0033]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Redford et al (WO 2020/188121) teaches a method and system for estimating depth of structures using a combination of images with lidar data.
	Moloney et al (CN 110383340) teaches generation of data from sparse volume data generated from depth sensor to improve the perception of objects observed from an autonomous vehicle.
	Smolyanskiy et al (US PG PUB 2019/0295282) teaches a system for depth estimation using convolution layers and depth data analysis including monocular images for training the depth detection for sparse depth data.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATHLEEN M BROUGHTON whose telephone number is (571)270-7380. The examiner can normally be reached Monday-Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on 571-272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KATHLEEN M BROUGHTON/Examiner, Art Unit 2667   

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667