Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

DETAILED ACTION
Claims 1 – 20 are pending in this application. Claims 1, 8 and 15 are independent.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1 – 6, 8 – 13 and 15 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Stojanovic, Milos (US-20190050648-A1, hereinafter simply referred to as Milos). 

Regarding independent claim 1, Milos teaches:
A vehicle communication and control system (See at least Milos, ¶ [0054], FIG. 1, "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130.…"), comprising: a first vehicle (e.g., Vehicle 104 (FIG. 1) of Milos) in signal communication with a remote computing system (e.g., SME computing device 102 (FIG. 1) of Milos) (See at least Milos, ¶ [0054, 0058], FIG. 1, "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…"), the first vehicle including a sensor (e.g., one or more imagers (such as but not limited to a camera) included in the vehicle of Milos) configured to capture a raw image having a first image volume and including at least one target object (See at least Milos, ¶ [0054, 0058], FIGS. 1, 2, 5 and 8 – 10; "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…"); an image encoder (e.g., CNN 200 includes encoder (or downsampling) layers 220 (FIG. 2) of Milos) included in the first vehicle and configured to convert (e.g., by downsampling input image 202 of Milos) the raw image into a masked image having a second image volume (e.g., pixels of Milos) that is less than the first image volume (e.g., The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values) in Milos) (See at least Milos, ¶ [0040, 0054, 0058, 0061], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…"); and a segmentation unit (e.g., Visual image semantic segmenter 144 of Milos) included in the remote computing system (e.g., system 102 (FIG. 1) of Milos), the segmentation unit configured to determine the at least one target object (e.g., FIG. 2, #272 of Milos) from the masked image (e.g., FIG. 2, #204 of Milos), to generate a masked segmented image (e.g., FIG. 2, #204 of Milos) including a sparse segmentation (e.g., FIG. 2, #272, 278, 260 of Milos) of the at least one target object (e.g., FIG. 2, #272 of Milos) (See at least Milos, ¶ [0040, 0054, 0058, 0061], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…Visual image semantic segmenter 144 may include and/or employ a CNN similar to CNN 200 to semantically segment the visual images. More specifically, CNN 200 implements a fully convolutional network (FCN) architecture that semantically segments an input visual image 202 to generate a corresponding output semantic image 204…The encoding layers 220 are generally responsible for detecting and/or recognizing features (e.g., latent and/or hidden features), via convolution operations, in visual input 202 and encoding the features within a representation of the imager (e.g., a vector embedding)…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…"), and to convert (e.g., by upsampling to generate semantic output image 204 of Milos) the sparse segmentation of the at least one target object into at least one recovered segmented target object (e.g., semantic output image 204 (FIG. 2) of Milos) indicative of the at least one target object (e.g., FIG. 2, #272 of Milos) (See at least Milos, ¶ [0040, 0054, 0058, 0061], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…Visual image semantic segmenter 144 may include and/or employ a CNN similar to CNN 200 to semantically segment the visual images. More specifically, CNN 200 implements a fully convolutional network (FCN) architecture that semantically segments an input visual image 202 to generate a corresponding output semantic image 204…The encoding layers 220 are generally responsible for detecting and/or recognizing features (e.g., latent and/or hidden features), via convolution operations, in visual input 202 and encoding the features within a representation of the imager (e.g., a vector embedding)…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…The decoder layers 240 are generally responsible for decoding the downsampled representation of input image and generate the semantic output image 204, via deconvolution operations (i.e., transposing the convolution operations of the encoding layers 220). That is, decoder layers 240 decode the representation of input image 202 via upsampling to generate semantic output image 204…").
Milos teaches all the subject matters of the claimed inventive concept as expressed in the rejections above. However, the teachings are taught in separate embodiments.
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Milos taught in separate embodiments for the desirable and advantageous purpose of enabling a dense approach for localization that employs all (or most) of the semantically-labelled pixels of semantic maps/images, thus providing significant advantages, over conventional methods, for the performance of the inverse perspective mappings, as discussed in Milos (See ¶ [0033]); thereby, helping to improve the overall system robustness by enabling a dense approach for localization that employs all (or most) of the semantically-labelled pixels of semantic maps/images, thus providing significant advantages, over conventional methods, for the performance of the inverse perspective mappings.

Regarding independent claims 8 and 15, Milos teaches:
A method of exchanging data with a vehicle (e.g., Vehicle 104 (FIG. 1) of Milos) (See at least Milos, ¶ [0054, 0058], FIGS. 1, 2, 5 and 8 – 10; "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…"), the method comprising: capturing, via a sensor (e.g., one or more imagers (such as but not limited to a camera) included in the vehicle of Milos) included on the vehicle, a raw image having a first image volume and including at least one target object (See at least Milos, ¶ [0054, 0058], FIGS. 1, 2, 5 and 8 – 10; "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…"); generating, via an image encoder (e.g., CNN 200 includes encoder (or downsampling) layers 220 (FIG. 2) of Milos) included on the vehicle, a masked image having a second image volume that is less than the first image volume (e.g., The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values) in Milos) (See at least Milos, ¶ [0040, 0054, 0058, 0061], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…"); delivering the masked image to a convolution neural network (CNN) (e.g., CNN 200 of Milos) located remotely from the vehicle (See at least Milos, ¶ [0040, 0054, 0058, 0061], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…" See also ¶ [0115]); determining, via the CNN, the at least one target object from the masked image generating a masked segmented image including a sparse segmentation of the at least one target object (See at least Milos, ¶ [0040, 0054, 0058, 0061], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…Visual image semantic segmenter 144 may include and/or employ a CNN similar to CNN 200 to semantically segment the visual images. More specifically, CNN 200 implements a fully convolutional network (FCN) architecture that semantically segments an input visual image 202 to generate a corresponding output semantic image 204…The encoding layers 220 are generally responsible for detecting and/or recognizing features (e.g., latent and/or hidden features), via convolution operations, in visual input 202 and encoding the features within a representation of the imager (e.g., a vector embedding)…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…The decoder layers 240 are generally responsible for decoding the downsampled representation of input image and generate the semantic output image 204, via deconvolution operations (i.e., transposing the convolution operations of the encoding layers 220). That is, decoder layers 240 decode the representation of input image 202 via upsampling to generate semantic output image 204…"); and converting (e.g., by upsampling to generate semantic output image 204 of Milos), via an image decoder located remotely from the vehicle, the sparse segmentation of the at least one target object into at least one recovered segmented target object indicative of the at least one target object (See at least Milos, ¶ [0040, 0054, 0058, 0061], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…Visual image semantic segmenter 144 may include and/or employ a CNN similar to CNN 200 to semantically segment the visual images. More specifically, CNN 200 implements a fully convolutional network (FCN) architecture that semantically segments an input visual image 202 to generate a corresponding output semantic image 204…The encoding layers 220 are generally responsible for detecting and/or recognizing features (e.g., latent and/or hidden features), via convolution operations, in visual input 202 and encoding the features within a representation of the imager (e.g., a vector embedding)…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…The decoder layers 240 are generally responsible for decoding the downsampled representation of input image and generate the semantic output image 204, via deconvolution operations (i.e., transposing the convolution operations of the encoding layers 220). That is, decoder layers 240 decode the representation of input image 202 via upsampling to generate semantic output image 204…").
Milos teaches all the subject matters of the claimed inventive concept as expressed in the rejections above. However, the teachings are taught in separate embodiments.
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Milos taught in separate embodiments for the desirable and advantageous purpose of enabling a dense approach for localization that employs all (or most) of the semantically-labelled pixels of semantic maps/images, thus providing significant advantages, over conventional methods, for the performance of the inverse perspective mappings, as discussed in Milos (See ¶ [0033]); thereby, helping to improve the overall system robustness by enabling a dense approach for localization that employs all (or most) of the semantically-labelled pixels of semantic maps/images, thus providing significant advantages, over conventional methods, for the performance of the inverse perspective mappings.

Regarding dependent claim 2, Milos teaches:
wherein the segmentation unit comprises: a convolution neural network (CNN) (e.g., CNN 200 of Milos) configured to generate the sparse segmentation of the at least one target object by applying a label to pixels associated with the at least one target object and excluding pixels disassociated with the at least one target object (See at least Milos, ¶ [0040, 0054, 0058, 0061], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…Visual image semantic segmenter 144 may include and/or employ a CNN similar to CNN 200 to semantically segment the visual images. More specifically, CNN 200 implements a fully convolutional network (FCN) architecture that semantically segments an input visual image 202 to generate a corresponding output semantic image 204…The encoding layers 220 are generally responsible for detecting and/or recognizing features (e.g., latent and/or hidden features), via convolution operations, in visual input 202 and encoding the features within a representation of the imager (e.g., a vector embedding)…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…The decoder layers 240 are generally responsible for decoding the downsampled representation of input image and generate the semantic output image 204, via deconvolution operations (i.e., transposing the convolution operations of the encoding layers 220). That is, decoder layers 240 decode the representation of input image 202 via upsampling to generate semantic output image 204…"); and an image decoder (e.g., CNN 200 includes decoder (upsampling) layers 240 (FIG. 2) of Milos) configured to generate the at least one recovered segmented target object based on the label applied to the pixels (See at least Milos, ¶ [0040, 0054, 0058, 0061], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…Visual image semantic segmenter 144 may include and/or employ a CNN similar to CNN 200 to semantically segment the visual images. More specifically, CNN 200 implements a fully convolutional network (FCN) architecture that semantically segments an input visual image 202 to generate a corresponding output semantic image 204…The encoding layers 220 are generally responsible for detecting and/or recognizing features (e.g., latent and/or hidden features), via convolution operations, in visual input 202 and encoding the features within a representation of the imager (e.g., a vector embedding)…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…The decoder layers 240 are generally responsible for decoding the downsampled representation of input image and generate the semantic output image 204, via deconvolution operations (i.e., transposing the convolution operations of the encoding layers 220). That is, decoder layers 240 decode the representation of input image 202 via upsampling to generate semantic output image 204…").
Regarding dependent claims 3, 10 and 17, Milos teaches:
generating the masked image according to a random mask (e.g., using dense semantic registration in Milos) and generating the at least one recovered segmented target object according to the random mask (See at least Milos, ¶ [0040, 0054, 0058, 0061, 0080], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…Visual image semantic segmenter 144 may include and/or employ a CNN similar to CNN 200 to semantically segment the visual images. More specifically, CNN 200 implements a fully convolutional network (FCN) architecture that semantically segments an input visual image 202 to generate a corresponding output semantic image 204…The encoding layers 220 are generally responsible for detecting and/or recognizing features (e.g., latent and/or hidden features), via convolution operations, in visual input 202 and encoding the features within a representation of the imager (e.g., a vector embedding)…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…The decoder layers 240 are generally responsible for decoding the downsampled representation of input image and generate the semantic output image 204, via deconvolution operations (i.e., transposing the convolution operations of the encoding layers 220). That is, decoder layers 240 decode the representation of input image 202 via upsampling to generate semantic output image 204…", "…transforming the visual-domain into the semantic-domain (for both maps and images) enables dense semantic registration (i.e., the employment of all or at least most pixels) methods that decrease the per pixel difference (i.e. cost function) in order to align drive-time image with the map…").

Regarding dependent claims 4, 9 and 16, Milos teaches:
wherein the at least one recovered segmented target object is an approximation of a segmentation of the at least one target object (e.g., the approximate viewpoint from the where the image and/or map may have been captured and/or generated from in Milos) based on the original raw data (See at least Milos, ¶ [0040, 0054, 0058, 0061, 0080], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…Visual image semantic segmenter 144 may include and/or employ a CNN similar to CNN 200 to semantically segment the visual images. More specifically, CNN 200 implements a fully convolutional network (FCN) architecture that semantically segments an input visual image 202 to generate a corresponding output semantic image 204…The encoding layers 220 are generally responsible for detecting and/or recognizing features (e.g., latent and/or hidden features), via convolution operations, in visual input 202 and encoding the features within a representation of the imager (e.g., a vector embedding)…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…The decoder layers 240 are generally responsible for decoding the downsampled representation of input image and generate the semantic output image 204, via deconvolution operations (i.e., transposing the convolution operations of the encoding layers 220). That is, decoder layers 240 decode the representation of input image 202 via upsampling to generate semantic output image 204…", "…transforming the visual-domain into the semantic-domain (for both maps and images) enables dense semantic registration (i.e., the employment of all or at least most pixels) methods that decrease the per pixel difference (i.e. cost function) in order to align drive-time image with the map…").

Regarding dependent claims 5, 12 and 19, Milos teaches:
applying, via the CNN, a first label (e.g., semantically-labeled (i.e., semantic representations of) tangible objects…a dense approach for localization that employs all (or most) of the semantically-labelled pixels of semantic maps/images in Milos) to pixels associated with a first type of target object and applying a different second label (e.g., a semantic map may correspond to one or more semantic labels in Milos) to pixels associated with a second type of target object different from the first type of target object (See at least Milos, ¶ [0040, 0054, 0058, 0061, 0080], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…Visual image semantic segmenter 144 may include and/or employ a CNN similar to CNN 200 to semantically segment the visual images. More specifically, CNN 200 implements a fully convolutional network (FCN) architecture that semantically segments an input visual image 202 to generate a corresponding output semantic image 204…The encoding layers 220 are generally responsible for detecting and/or recognizing features (e.g., latent and/or hidden features), via convolution operations, in visual input 202 and encoding the features within a representation of the imager (e.g., a vector embedding)…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…The decoder layers 240 are generally responsible for decoding the downsampled representation of input image and generate the semantic output image 204, via deconvolution operations (i.e., transposing the convolution operations of the encoding layers 220). That is, decoder layers 240 decode the representation of input image 202 via upsampling to generate semantic output image 204…", "…transforming the visual-domain into the semantic-domain (for both maps and images) enables dense semantic registration (i.e., the employment of all or at least most pixels) methods that decrease the per pixel difference (i.e. cost function) in order to align drive-time image with the map…" See also ¶ [0035]).

Regarding dependent claims 6, 13 and 20, Milos teaches:
prioritizing decoding (e.g., via A softmax function layer 246 in Milos) of the pixels associated with the first label over the pixels associated with the different second label (See at least Milos, ¶ [0040, 0054, 0058, 0061, 0062, 0080], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…Visual image semantic segmenter 144 may include and/or employ a CNN similar to CNN 200 to semantically segment the visual images. More specifically, CNN 200 implements a fully convolutional network (FCN) architecture that semantically segments an input visual image 202 to generate a corresponding output semantic image 204…The encoding layers 220 are generally responsible for detecting and/or recognizing features (e.g., latent and/or hidden features), via convolution operations, in visual input 202 and encoding the features within a representation of the imager (e.g., a vector embedding)…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…The decoder layers 240 are generally responsible for decoding the downsampled representation of input image and generate the semantic output image 204, via deconvolution operations (i.e., transposing the convolution operations of the encoding layers 220). That is, decoder layers 240 decode the representation of input image 202 via upsampling to generate semantic output image 204…", "…A softmax function layer 246 may be employed to enable semantically classifying (or labeling) each region of the segmented image 204…", "…transforming the visual-domain into the semantic-domain (for both maps and images) enables dense semantic registration (i.e., the employment of all or at least most pixels) methods that decrease the per pixel difference (i.e. cost function) in order to align drive-time image with the map…" See also ¶ [0035]).

Regarding dependent claims 11 and 18, Milos teaches:
applying, via the CNN, a label (e.g., semantically-labeled (i.e., semantic representations of) tangible objects…a dense approach for localization that employs all (or most) of the semantically-labelled pixels of semantic maps/images in Milos) to pixels included in the masked image to produce the masked segmented image (See at least Milos, ¶ [0040, 0054, 0058, 0061, 0080], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…Visual image semantic segmenter 144 may include and/or employ a CNN similar to CNN 200 to semantically segment the visual images. More specifically, CNN 200 implements a fully convolutional network (FCN) architecture that semantically segments an input visual image 202 to generate a corresponding output semantic image 204…The encoding layers 220 are generally responsible for detecting and/or recognizing features (e.g., latent and/or hidden features), via convolution operations, in visual input 202 and encoding the features within a representation of the imager (e.g., a vector embedding)…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…The decoder layers 240 are generally responsible for decoding the downsampled representation of input image and generate the semantic output image 204, via deconvolution operations (i.e., transposing the convolution operations of the encoding layers 220). That is, decoder layers 240 decode the representation of input image 202 via upsampling to generate semantic output image 204…", "…transforming the visual-domain into the semantic-domain (for both maps and images) enables dense semantic registration (i.e., the employment of all or at least most pixels) methods that decrease the per pixel difference (i.e. cost function) in order to align drive-time image with the map…" See also ¶ [0035]); and generating, via the image decoder, the at least one recovered segmented target object based on the masked segmented image (See at least Milos, ¶ [0040, 0054, 0058, 0061, 0080], FIGS. 1, 2, 5 and 8 – 10; "…The image data may be encoded in a pixel format. Thus, the pixel values of the semantic images encode semantic labels (or concepts) that correspond to the environment's tangible objects…", "…FIG. 1 illustrates an enhanced object localization system 100…Vehicle 104 may be enabled to (autonomously or semi-autonomously) drive and/or navigate over at least the drivable portions of the surface of the Earth 130…vehicle 104 may include an imaging system that is enabled is capture drive-time terrestrial-view visual images of vehicle's 104 environment…", "…Communication network 110 may communicatively couple image/map database 112, or any other storage device, to at least a portion of computing devices 102, 106, and 108, as well as any of mobile-imaging service providers 120…", "…Visual image semantic segmenter 144 may include and/or employ a CNN similar to CNN 200 to semantically segment the visual images. More specifically, CNN 200 implements a fully convolutional network (FCN) architecture that semantically segments an input visual image 202 to generate a corresponding output semantic image 204…The encoding layers 220 are generally responsible for detecting and/or recognizing features (e.g., latent and/or hidden features), via convolution operations, in visual input 202 and encoding the features within a representation of the imager (e.g., a vector embedding)…The encoded representation of input image 202 generally includes less information than the visual representation of input image 202 (i.e., pixel values)…The decoder layers 240 are generally responsible for decoding the downsampled representation of input image and generate the semantic output image 204, via deconvolution operations (i.e., transposing the convolution operations of the encoding layers 220). That is, decoder layers 240 decode the representation of input image 202 via upsampling to generate semantic output image 204…", "…transforming the visual-domain into the semantic-domain (for both maps and images) enables dense semantic registration (i.e., the employment of all or at least most pixels) methods that decrease the per pixel difference (i.e. cost function) in order to align drive-time image with the map…" See also ¶ [0035]).




Claim(s) 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Stojanovic, Milos (US-20190050648-A1, hereinafter simply referred to as Milos) in view of Barton, Theresa (US-20210311618-A1, hereinafter simply referred to as Barton).

Regarding dependent claims 7 and 14, Milos does not expressly teach:
wherein the image decoder applies a matrix completion algorithm to the sparse segmentation of the at least one target object to generate the recovered segmented target object.
Nevertheless, Barton teaches the concept of applying a matrix completion algorithm to the sparse segmentation of the at least one target object to generate the recovered segmented target object (See at least Barton, ¶ [0033], FIGS. 1, 2; "…The transformation system 160 may generate a low-rank matrix approximation of the convolution operator. In some example embodiments, the low-rank matrix approximation of the convolution operator enables the client device to use a convolutional neural network to generate modified versions of segmented portions of an image with fast inference speed, compact model size, and low energy consumption and apply the modified image segments to frames of a video stream to generate a modified video stream…").
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of applying a matrix completion algorithm to the sparse segmentation of the at least one target object to generate the recovered segmented target object as disclosed in the device of Barton to modify the known and similar device of Milos for the desirable and advantageous purpose of enabling the client device to use a convolutional neural network to generate modified versions of segmented portions of an image with fast inference speed, compact model size, and low energy consumption while enabling fast convolution-based image modification using efficient approximations of tensor projections to significantly reduce the computational time of the convolution operator, as discussed in Barton (See ¶ [0033]); thereby, helping to improve the overall system robustness by enabling the client device to use a convolutional neural network to generate modified versions of segmented portions of an image with fast inference speed, compact model size, and low energy consumption while enabling fast convolution-based image modification using efficient approximations of tensor projections to significantly reduce the computational time of the convolution operator.


















Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure: See the Notice of References Cited (PTO–892)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IDOWU O OSIFADE whose telephone number is (571)272-0864. The Examiner can normally be reached on Monday-Friday 8:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the Examiner’s Supervisor, Emily Terrell can be reached on (571) 270 – 3717. The fax phone number for the organization where this application or proceeding is assigned is (571) 273 – 8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. 
Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at (866) 217 – 9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call (800) 786 – 9199 (IN USA OR CANADA) or (571) 272 – 1000.



/IDOWU O OSIFADE/Primary Examiner, Art Unit 2666