DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
YThe present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/1/21 has been entered.

Status of Claims
Claims 1, 13, and 18 are amended. Claims 1-20 are pending. 

Response to Arguments
Applicant’s arguments with respect to Examiner's rejections under 35 USC 102 and 103 have been considered but are not persuasive. Therefore, these rejections are maintained.
Regarding claim 1, Applicant asserts that the cited prior art does not teach, "wherein the VAE-GAN comprises a shared latent space for generating each of the reconstructed pose vector data, the reconstructed depth map, and reconstructed 
First, Applicant’s amended claim language is rejected as vague and indefinite under 35 USC 112(b) (see discussion below). For example, it is unclear whether (and how) the language “reconstructed” and/or “reconstructed pose” are intended to substantively limit the terms “vector data”, “depth map”, and “images”. Similarly, the term “shared latent space” is rejected as vague and indefinite as it is unclear whether expressly recited claim elements are intended to affirmatively “share” the latent space (and if so, which claim elements, and how?). Moreover, it is unclear whether the scope of the language “a shared latent space for generating each of the reconstructed pose vector data, the reconstructed depth map, and reconstructed images" is intended to affirmatively require specific performance of generating pose vector data, a depth map, or images or whether this language is deliberately articulated as an expression of intended use.
Nonetheless, Ros Sanchez teaches “wherein the VAE-GAN comprises a shared latent space for generating each of the reconstructed pose vector data, the reconstructed depth map, and reconstructed images" (see e.g. at least para 5-11, 33, 38, Fig. 2-4, and related text, illustrating a latent space that is shared during the generation and training of reconstructed pose vector data, a reconstructed depth map, and reconstructed images).

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claim 1 recites: "A method comprising:
receiving an image from a camera of a vehicle;
providing the image to a variational autoencoder generative adversarial network (VAE-GAN);
reconstructed pose vector data and a reconstructed depth map based on the image; and
calculating simultaneous localization and mapping for the vehicle based on the reconstructed pose vector data and the reconstructed depth map;
wherein the VAE-GAN comprises a shared latent space for generating each of the reconstructed pose vector data, the reconstructed depth map, and reconstructed images."
This language is vague and indefinite for at least the following reasons:
Generally Unclear: 
The terms "reconstructed pose vector data", “reconstructed depth map”, and “reconstructed images” are vague and indefinite as the scope of these terms are not clearly articulated. Namely, it is unclear whether the term “reconstructed” is intended to require preliminary (and yet undefined) steps of constructing pose vector data, constructing a depth map, and/or constructing images. Likewise, it is unclear whether (and how) the terms “reconstructed pose vector data”, “reconstructed depth map”, “reconstructed images” are intended to be directed to scope and subject matter different than the terms “pose vector data”, “depth map”, and “images”. Moreover, it is further unclear what constitutes "pose" vector data. For example, it is unclear whether this language is intended to be directed to a pose of an observed object, or whether this language is merely intended to suggest vector data observed from a pose or orientation of a camera/sensor.
The term “shared latent space” is vague and indefinite as it is unclear whether expressly recited claim elements are intended to affirmatively “share” the latent space (and if so, which claim elements, and how?)
Intended Use: The following language is vague and indefinite as it is unclear whether the scope of this language is intended to affirmatively require specific performance of specific functions or whether this language is deliberately articulated as an expression of intended use: 
"calculating simultaneous localization and mapping for the vehicle based on the reconstructed pose vector data and the reconstructed depth map"
“a shared latent space for generating each of the reconstructed pose vector data, the reconstructed depth map, and reconstructed images"
Accordingly, this language does not serve to patentably distinguish the claimed structure over that of the reference. See In re Pearson, 181 USPQ 641; In re Yanush, 177 USPQ 705; In re Finsterwalder, 168 USPQ 530; In re Casey, 512 USPQ 235; In re Otto, 136 USPQ 458; Ex parte Masham, 2 USPQ 2nd 1647.
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"A method comprising:
receiving an image from a camera of a vehicle;
providing the image to a variational autoencoder generative adversarial network (VAE-GAN);

calculating simultaneous localization and mapping [intended for the vehicle based on ];
wherein the VAE-GAN comprises a [intended for generating each of ]."
Claims 2-12 are further rejected as depending on this claim.

Claim 2 recites: "The method of claim 1, further comprising training the VAE-GAN, wherein training the VAE-GAN comprises:
providing a training image to an image encoder of the VAE-GAN, wherein the image encoder is configured to map the training image to a compressed latent representation of the training image;
providing training pose vector data based on the training image to a pose encoder of the VAE-GAN, wherein the pose encoder is configured to map the training pose vector data to a compressed latent representation of the training pose vector data; and
providing a training depth map based on the training image to a depth encoder of the VAE-GAN, wherein the depth encoder is configured to map the training depth map to a compressed latent representation of the training depth map."

35 USC 112(f): The following claim limitations invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph: 
“an image encoder of the VAE-GAN, wherein the image encoder is configured to map the training image to a compressed latent representation of the training image”
“a pose encoder of the VAE-GAN, wherein the pose encoder is configured to map the training pose vector data to a compressed latent representation of the training pose vector data”
“a depth encoder of the VAE-GAN, wherein the depth encoder is configured to map the training depth map to a compressed latent representation of the training depth map”
However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. For example, see para 6, describing a computing device providing data to the VAE-GAN, wherein, the VAE-GAN itself is not embodied on a computer, but merely described as a theoretical construct. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:

(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Intended Use: Alternatively, the following language is vague and indefinite as it is unclear whether the scope of this language is intended to affirmatively require specific performance of specific functions or whether this language is deliberately articulated as an expression of intended use: 
“an image encoder of the VAE-GAN, wherein the image encoder is configured to map the training image to a compressed latent representation of the training image”
“a pose encoder of the VAE-GAN, wherein the pose encoder is configured to map the training pose vector data to a compressed latent representation of the training pose vector data”
“a depth encoder of the VAE-GAN, wherein the depth encoder is configured to map the training depth map to a compressed latent representation of the training depth map”
Accordingly, this language does not serve to patentably distinguish the claimed structure over that of the reference. See In re Pearson, 181 USPQ 641; In re Yanush, 177 USPQ 705; In re Finsterwalder, 168 USPQ 530; In re Casey, 512 USPQ 235; In re Otto, 136 USPQ 458; Ex parte Masham, 2 USPQ 2nd 1647.
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"The method of claim 1, further comprising training the VAE-GAN, wherein training the VAE-GAN comprises:
[intended ];
providing training n [intended ]; and
providing a training depth map based on the training image to a depth encoder of the VAE-GAN, [wherein the depth encoder is intended ]."
Claims 3-6 are further rejected as depending on this claim.

Claim 3 recites: "The method of claim 2, wherein the VAE-GAN is trained utilizing a plurality of inputs in tandem, such that each of:
the image encoder and a corresponding image decoder;
the pose encoder and a corresponding pose decoder; and
the depth encoder and a corresponding depth decoder are trained in tandem utilizing the latent space of the VAE-GAN."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claims 1-2 above. Moreover, this language is further rejected as vague and indefinite for at least the following reasons:
Generally Unclear: The language: “such that each of:
the image encoder and a corresponding image decoder;
the pose encoder and a corresponding pose decoder; and
the depth encoder and a corresponding depth decoder are trained in tandem utilizing the latent space of the VAE-GAN” is vague and indefinite as the scope of this limitation(s) is not clearly articulated. Namely, it is unclear whether this language is intended to be directed to (up to) four distinct limitations (delimited by colons or semicolons), or whether this language is directed to a single limitation. Accordingly, it is unclear whether this language is directed to a method, an apparatus, or a hybrid claim consisting of structural elements. Moreover, it is unclear what elements are intended to be “trained in tandem utilizing the latent space of the VAE-GAN” (e.g. “the depth encoder and a corresponding depth decoder”, or “the image encoder, the pose encoder, and/or the depth decoder”, or “the image encoder, an image decoder, the pose encoder, a pose decoder, the depth encoder, and/or a depth decoder”).
Inconsistent Terms: The language of the claims use inconsistent and vague claim terms, such that it is unclear if two similarly worded terms, or in other cases, a specific term followed by a general term, are intended to refer to the same claim element, or whether these terms are intended to be interpreted as distinct claim elements, or further, whether use of a general term is intended to be construed broadly such that a broad term may or may not encompass the narrow term, and finally, when requisite antecedent basis is established for each of these terms as used herein. For example, the following terms are unclear as used in the claims:
“the latent space” (cl. 3) vs. “a shared latent space” (cl. 1)
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"The method of claim 2, wherein the VAE-GAN is trained utilizing a plurality of inputs in tandem, such that each of, an image decoder, an encoder , a , the depth encoder, and/or a depth decoder are trained in tandem utilizing a latent space of the VAE-GAN."

Claim 4 recites: "The method of claim 2, wherein each of the training image, the training pose vector data, and the training depth map share the latent space of the VAE-GAN."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claims 1-3 above. 
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"The method of claim 2, wherein each of the training image, the training a latent space of the VAE-GAN."

Claim 5 recites: "The method of claim 2, wherein the VAE-GAN comprises an encoded latent space vector that is applicable to each of the training image, the training pose vector data, and the training depth map."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claims 1-4 above. 
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"The method of claim 2, wherein the VAE-GAN comprises a latent space 

Claim 6 recites: "The method of claim 2, further comprising determining the training pose vector data based on the training image, wherein determining the training pose vector data comprises:
receiving a plurality of stereo images forming a stereo image sequence; and
calculating six Degree of Freedom pose vector data for successive images of the stereo image sequence using stereo visual odometry;
wherein the training image provided to the VAE-GAN comprises a single image of a stereo image pair of the stereo image sequence."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claim 1 above. 

"The method of claim 2, further comprising determining the training 
receiving a plurality of stereo images forming a stereo image sequence; and
calculating six Degree of Freedom [intended for successive images of the stereo image sequence using stereo visual odometry];
wherein the training image provided to the VAE-GAN comprises a single image of a stereo image pair of the stereo image sequence."

Claim 7 recites: "The method of claim 1, wherein the camera of the vehicle comprises a monocular camera configured to capture a sequence of images of an environment of the vehicle, and wherein the image comprises a red-green-blue (RGB) image."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claim 1 above. Moreover, this language is further rejected as vague and indefinite for at least the following reasons:
Antecedent Basis: The following terms lack proper antecedent basis: “the image”.

"The method of claim 1, wherein the camera of the vehicle comprises a monocular camera configured to capture a sequence of images of an environment of the vehicle, and wherein an image comprises a red-green-blue (RGB) image."

Claim 8 recites: "The method of claim 1, wherein the VAE-GAN comprises an encoder opposite to a decoder, and wherein the decoder comprises a generative adversarial network (GAN) configured to generate an output, wherein the GAN comprises a GAN generator and a GAN discriminator."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claims 1-2 above. 
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"The method of claim 1, wherein the VAE-GAN comprises an encoder opposite to a decoder, and wherein the decoder comprises a generative adversarial network (GAN) [intended ]."

Claim 9 recites: "The method of claim 1, wherein the VAE-GAN comprises:
a trained image encoder configured to receive the image;
a trained pose decoder comprising a GAN configured to generate the reconstructed pose vector data based on the image; and
a trained depth decoder comprising a GAN configured to generate the reconstructed depth map based on the image."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claims 1-2 and 8 above. 
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"The method of claim 1, wherein the VAE-GAN comprises:
a trained image encoder [intended ];
a trained [intended ]; and
a trained depth decoder comprising a GAN [intended ]."

Claim 10 recites: "The method of claim 1, wherein the VAE-GAN comprises:
an image encoder configured to map the image to a compressed latent representation;
a pose decoder comprising a GAN generator adversarial to a GAN discriminator;
a depth decoder comprising a GAN generator adversarial to a GAN discriminator; and
a latent space, wherein the latent space is common to each of the image encoder, the pose decoder, and the depth decoder."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claims 1-2, and 8-9 above. 
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"The method of claim 1, wherein the VAE-GAN comprises:
an image encoder [intended ];
a 
a depth decoder comprising a GAN generator adversarial to a GAN discriminator; and
a latent space, wherein a latent space is common to each of the image encoder, a decoder, and the depth decoder."
Claim 11 is further rejected as depending on this claim.

Claim 11 recites: "The method of claim 10, wherein the latent space of the VAE-GAN comprises an encoded latent space vector utilized for each of the image encoder, the pose decoder, and the depth decoder."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claims 1-5 and 8-10 above. 

"The method of claim 10, wherein the latent space of the VAE-GAN comprises a latent space [intended a decoder, and the depth decoder]."

Claim 12 recites: "The method of claim 1, wherein the reconstructed pose vector data comprises six Degree of Freedom pose data pertaining to the camera of the vehicle."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claim 1 above. 
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"The method of claim 1, wherein 

Claim 13 recites: “Non-transitory computer-readable storage media storing instructions for executing by one or more processors, the instructions comprising:
receiving an image from a camera of a vehicle;
providing the image to a variational autoencoder generative adversarial network (VAE-GAN);
reconstructed pose vector data and a reconstructed depth map based on the image; and
calculating simultaneous localization and mapping for the vehicle based on the reconstructed pose vector data and the reconstructed depth map;
wherein the VAE-GAN comprises a shared latent space for generating each of the reconstructed pose vector data, the reconstructed depth map, and reconstructed images."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claims 1-2 above. 
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
“Non-transitory computer-readable storage media storing instructions [intended for executing by one or more processors, the instructions comprising:
receiving an image from a camera of a vehicle;
providing the image to a variational autoencoder generative adversarial network (VAE-GAN);
receiving from the VAE-GAN 
calculating simultaneous localization and mapping [intended for the vehicle based on a depth map];
[intended for generating each of a depth map, and ]]]."
Claims 14-17 are further rejected as depending on this claim.

Claim 14 recites: "The non-transitory computer-readable storage media of claim 13, wherein the instructions further comprise training the VAE-GAN, wherein training the VAE-GAN comprises:
providing a training image to an image encoder of the VAE-GAN, wherein the image encoder is configured to map the training image to a compressed latent representation in the latent space;
providing training pose vector data based on the training image to a pose encoder of the VAE-GAN, wherein the pose encoder is configured to map the training pose vector data to a compressed latent representation in the latent space; and
providing a training depth map based on the training image to a depth encoder of the VAE-GAN, wherein the depth encoder is configured to map the training depth map to a compressed latent representation in the latent space."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claims 1-2 and 13 above. 
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:

providing a training image to an image encoder of the VAE-GAN, [wherein the image encoder is intended a latent space];
providing training an encoder of the VAE-GAN, [wherein an encoder is intended a latent space]; and
providing a training depth map based on the training image to a depth encoder of the VAE-GAN, [wherein the depth encoder is intended a latent space]."
Claims 15-16 are further rejected as depending on this claim.

Claim 15 recites: "The non-transitory computer-readable storage media of claim 14, wherein the instructions comprise training the VAE-GAN utilizing a plurality of inputs in tandem, such that each of:
the image encoder and a corresponding image decoder;
the pose encoder and a corresponding pose decoder; and
the depth encoder and a corresponding depth decoder are trained in tandem such that each of the training image, the training pose vector data, and the training depth map share the latent space of the VAE-GAN."

Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"The non-transitory computer-readable storage media of claim 14, wherein the instructions comprise training the VAE-GAN utilizing a plurality of inputs in tandem, such that each of, an image decoder, an encoder , a , the depth encoder, and/or a depth decoder are trained in tandem such that each of the training image, the training a latent space of the VAE-GAN."

Claim 16 recites: "The non-transitory computer-readable storage media of claim 14, wherein the instructions further comprise calculating the training pose vector data based on the training image, wherein calculating the training pose vector data comprises:
receiving a plurality of stereo images forming a stereo image sequence; and
calculating six Degree of Freedom pose vector data for successive images of the stereo image sequence using stereo visual odometry;
wherein the training image provided to the VAE-GAN comprises a single image of a stereo image pair of the stereo image sequence."

Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"The non-transitory computer-readable storage media of claim 14, wherein the instructions further comprise calculating the training 
receiving a plurality of stereo images forming a stereo image sequence; and
calculating six Degree of Freedom [intended for successive images of the stereo image sequence using stereo visual odometry];
wherein the training image provided to the VAE-GAN comprises a single image of a stereo image pair of the stereo image sequence."

Claim 17 recites: "The non-transitory computer-readable storage media of claim 13, wherein the VAE-GAN comprises an encoder opposite to a decoder, and wherein the decoder comprises a generative adversarial network (GAN) configured to generate an output, wherein the GAN comprises a GAN generator and a GAN discriminator."
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claims 1-2, 8, and 13 above. 
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
[intended ]."

Claim 18 recites: "A system for simultaneous localization and mapping of a vehicle in an environment, the system comprising:
a monocular camera of a vehicle;
a vehicle controller in communication with the monocular camera, wherein the vehicle controller comprises one or more processors for executing instructions, wherein the instructions comprise:
receiving an image from the monocular camera of the vehicle;
providing the image to a variational autoencoder generative adversarial network (VAE-GAN);
receiving from the VAE-GAN reconstructed pose vector data based on the image;
receiving from the VAE-GAN a reconstructed depth map based on the image;
 calculating simultaneous localization and mapping for the vehicle based on one or more of the reconstructed pose vector data and the reconstructed depth map;
wherein the VAE-GAN comprises a shared latent space for generating each of the reconstructed pose vector data, the reconstructed depth map, and reconstructed images."

Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"A system for simultaneous localization and mapping of a vehicle in an environment, the system comprising:
a monocular camera of a vehicle;
a vehicle controller in communication with the monocular camera, wherein the vehicle controller comprises one or more processors [intended for executing instructions, wherein the instructions comprise:
receiving an image from the monocular camera of the vehicle;
providing the image to a variational autoencoder generative adversarial network (VAE-GAN);
receiving from the VAE-GAN 
receiving from the VAE-GAN a 
 calculating simultaneous localization and mapping [intended for the vehicle based on one or more of ];
wherein the VAE-GAN comprises a [intended for generating each of a depth map, and ]]."


Claim 19 recites: "The system of claim 18, wherein the VAE-GAN is trained and training the VAE-GAN comprises:
providing a training image to an image encoder of the VAE-GAN, wherein the image encoder is configured to map the training image to a compressed latent representation of the training image;
providing training pose vector data based on the training image to a pose encoder of the VAE-GAN, wherein the pose encoder is configured to map the training pose vector data to a compressed latent representation of the training pose vector data; and
providing a training depth map based on the training image to a depth encoder of the VAE-GAN, wherein the depth encoder is configured to map the training depth map to a compressed latent representation of the training depth map.”
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claims 1-2, 13-14, and 18 above. 
Although the following language does not necessarily cure the issues discussed above, for purposes of examination under 35 USC 102 and 103, Examiner will interpret this language as reading:
"The system of claim 18, wherein the VAE-GAN is trained and training the VAE-GAN comprises:
intended a training image];
providing training vector data based on the training image to an encoder of the VAE-GAN, [wherein an encoder is intended ]; and
providing a training depth map based on the training image to a depth encoder of the VAE-GAN, [wherein the depth encoder is intended ].”

Claim 20 recites: “The system of claim 18, wherein the VAE-GAN comprises:
an image encoder configured to map the image to a compressed latent representation;
a pose decoder comprising a GAN generator adversarial to a GAN discriminator;
a depth decoder comprising a GAN generator adversarial to a GAN discriminator; and
a latent space, wherein the latent space is common to each of the image encoder, the pose decoder, and the depth decoder.”
This language is also rejected as vague and indefinite for the same reasons discussed in the rejection of claims 1-2, 8-10, and 17-18 above. 

“The system of claim 18, wherein the VAE-GAN comprises:
an image encoder [intended ];
a 
a depth decoder comprising a GAN generator adversarial to a GAN discriminator; and
a latent space, wherein a latent space is common to each of the image encoder, a decoder, and the depth decoder.”

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-6, 8-11, 13-15, and 17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ros Sanchez (US 2019/0354804 A1).

Regarding claim 1, Ros Sanchez discloses a method (see e.g. at least Abstract, Fig. 4-5, and related text) comprising:
receiving an image from a camera of a vehicle (e.g. at least camera 626, see e.g. at least p. 3, 77, Fig. 6, and related text);
providing the image to a variational autoencoder generative adversarial network (VAE-GAN) (e.g. at least VAE-GAN, generator 200, see e.g. at least p. 5-11, 32-33, Fig. 2, and related text);
receiving from the VAE-GAN (see e.g. at least p. 37, 57, 82-83, Fig. 2-3, 5-6, and related text, illustrating the generator 200 outputting a custom image/model to, e.g. at least a second model, a hierarchy/tree of models, a plurality of stacks, a chain, a memory, a database, a person) vector data (see e.g. at least p. 10-11, 24, 35, 77, 82, Fig. 6, and related text, wherein vector data is constructed from images of an area surrounding the vehicle obtained via camera 626, sensor system 620) and a depth map based on the image (e.g. at least one or more driving scene models, see e.g. at least Abstract, p. 11, 24, 35, 42, 50, 57, 82-83, Fig. 4-5, and related text, creating a map and determining the position and velocity of the vehicle 600, the location of obstacles, objects, or other environmental features, wherein the components of the model are created by object instances to vectors that correspond with identified characteristics in the latent space of an associated generator to form the custom image/model); and

wherein the VAE-GAN comprises a latent space [intended for generating each of the vector data, the depth map, and images] (see e.g. at least p. 7, 9-11, 21-24, 26, Fig. 1, 4-5, and related text).

Regarding claim 2, Ros Sanchez teaches that training the VAE-GAN (see e.g. at least p. 5-11, 32-33, Fig. 2, and related text) comprises:
providing a training image to an image encoder of the VAE-GAN, wherein the image encoder is [intended to map the training image to a compressed latent representation of the training image (see e.g. at least Fig. 4, and related text)];
providing training vector data based on the training image to an encoder of the VAE-GAN, wherein an encoder is [intended to map the training vector data to a compressed latent representation of the training vector data (see e.g. at least p. 5-11, Fig. 5, and related text)]; and
providing a training depth map based on the training image to a depth encoder of the VAE-GAN, [wherein the depth encoder is intended to map the training depth map to a compressed latent representation of the training depth map] (id., see also e.g. at least p. 7, 22, 24, 33-35).

claim 3, Ros Sanchez teaches that the VAE-GAN is trained utilizing a plurality of inputs in tandem, such that each of the image encoder (e.g. at least VAE), an image decoder (e.g. at least GAN, see e.g. at least p. 33), an encoder (id.), a decoder (id.), the depth encoder (id.), and/or a depth decoder (id.) are trained in tandem utilizing a latent space of the VAE-GAN (id.).

Regarding claim 4, Ros Sanchez teaches that each of the training image, the training vector data, and the training depth map share a latent space of the VAE-GAN (see e.g. at least p. 5-11, Fig. 4, and related text).

Regarding claim 5, Ros Sanchez teaches that the VAE-GAN comprises a latent space that is applicable to each of the training image, the training vector data, and the training depth map (see e.g. at least p. 5-11, Fig. 4, and related text).

Regarding claim 6, Ros Sanchez teaches that determining the training vector data based on the training image, wherein determining the training vector data comprises:
receiving a plurality of stereo images forming a stereo image sequence (see e.g. at least p. 72, Fig. 6, and related text); and
calculating vector data [intended for successive images of the stereo image sequence using stereo visual odometry] (id.);
wherein the training image provided to the VAE-GAN comprises a single image of a stereo image pair of the stereo image sequence (id.).

Accordingly, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the teaching of Ros Sanchez by calculating six Degree of Freedom vector data [intended for successive images of the stereo image sequence using stereo visual odometry] as taught by the combination of Ros Sanchez and Pirchheim in order to provide improved tracking and mapping in a reliable manner (Pirchheim: p. 4).

Regarding claim 8, Ros Sanchez teaches that the VAE-GAN comprises an encoder opposite to a decoder, and wherein the decoder comprises a generative adversarial network (GAN) [intended to generate an output, wherein the GAN comprises a GAN generator and a GAN discriminator] (see e.g. at least p. 33, Fig. 2, and related text).

Regarding claim 9, Ros Sanchez teaches that the VAE-GAN (see e.g. at least p. 33) comprises:
a trained image encoder [intended to receive the image] (see e.g. at least p. 5-11, 32-33, 77, Fig. 2, 4, and related text);
a trained decoder comprising a GAN [intended to generate vector data based on the image] (see e.g. at least p. 5-11, Fig. 2, and related text); and


Regarding claim 10, Ros Sanchez teaches that the VAE-GAN comprises:
an image encoder [intended to map the image to a compressed latent representation] (see e.g. at least p. 7, 22, 24, 33-35);
a decoder comprising a GAN generator adversarial to a GAN discriminator (see e.g. at least p. 5-11, Fig. 2, and related text);
a depth decoder comprising a GAN generator adversarial to a GAN discriminator (id., see also e.g. at least p. 7, 22, 24, 33-35); and
a latent space, wherein a latent space is common to each of the image encoder, a decoder, and the depth decoder (id., see also e.g. at least Fig. 4, and related text).

Regarding claim 11, Ros Sanchez teaches that the latent space of the VAE-GAN comprises a latent space [intended for each of the image encoder, a decoder, and the depth decoder] (see e.g. at least p. 5-11, Fig. 4, and related text).

Regarding claim 13, Ros Sanchez teaches non-transitory computer-readable storage media storing instructions [intended for executing by one or more processors (e.g. at least memory, processors, see e.g. at least p. 9, 29), the instructions comprising:
receiving an image from a camera of a vehicle (e.g. at least camera 626, see e.g. at least p. 3, 77, Fig. 6, and related text);

receiving from the VAE-GAN (see e.g. at least p. 37, 57, 82-83, Fig. 2-3, 5-6, and related text, illustrating the generator 200 outputting a custom image/model to, e.g. at least a second model, a hierarchy/tree of models, a plurality of stacks, a chain, a memory, a database, a person) vector data (see e.g. at least p. 10-11, 24, 35, 77, 82, Fig. 6, and related text, wherein vector data is constructed from images of an area surrounding the vehicle obtained via camera 626, sensor system 620) and a depth map based on the image (e.g. at least one or more driving scene models, see e.g. at least Abstract, p. 11, 24, 35, 42, 50, 57, 82-83, Fig. 4-5, and related text, creating a map and determining the position and velocity of the vehicle 600, the location of obstacles, objects, or other environmental features, wherein the components of the model are created by object instances to vectors that correspond with identified characteristics in the latent space of an associated generator to form the custom image/model); and
calculating simultaneous localization and mapping [intended for the vehicle based on vector data and a depth map] (see e.g. at least p. 57, 82-83, Fig. 5-6, and related text, using the data to generate and output one or more driving scene models, including creating a map and determining the position and velocity of the vehicle 600, the location of obstacles, objects, or other environmental features);
wherein the VAE-GAN comprises a latent space [intended for generating each of vector data, a depth map, and images]]] (see e.g. at least p. 7, 9-11, 21-24, 26, Fig. 1, 4-5, and related text).

Regarding claim 14, Ros Sanchez teaches that the instructions further comprise training the VAE-GAN, wherein training the VAE-GAN (see e.g. at least p. 5-11, 29, 32-33, Fig. 2, and related text) comprises:
providing a training image to an image encoder of the VAE-GAN, [wherein the image encoder is intended to map the training image to a compressed latent representation in a latent space] (see e.g. at least Fig. 4, and related text);
providing training vector data based on the training image to an encoder of the VAE-GAN, [wherein an encoder is intended to map the training vector data to a compressed latent representation in a latent space] (see e.g. at least p. 5-11, Fig. 5, and related text); and
providing a training depth map based on the training image to a depth encoder of the VAE-GAN, [wherein the depth encoder is intended to map the training depth map to a compressed latent representation in a latent space] (id., see also e.g. at least p. 7, 22, 24, 33-35).

Regarding claim 15, Ros Sanchez teaches that the instructions comprise training the VAE-GAN utilizing a plurality of inputs in tandem, such that each of the image encoder (see e.g. at least p. 33), an image decoder (id.), an encoder (id.), a decoder (id.), the depth encoder (id.), and/or a depth decoder (id.) are trained in tandem such that each of the training image, the training vector data, and the training depth map share a latent space of the VAE-GAN (id.).

claim 17, Ros Sanchez teaches that the VAE-GAN comprises an encoder opposite to a decoder, and wherein the decoder comprises a generative adversarial network (GAN) [intended to generate an output, wherein the GAN comprises a GAN generator and a GAN discriminator] (see e.g. at least 5-11, 32-33, 77, Fig. 2, 4, and related text).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 7, 12, 16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ros Sanchez (US 2019/0354804 A1) in view of Pirchheim (US 2019/0354804 A1).

Regarding claim 7, Pirchheim teaches limitations not expressly disclosed by Ros Sanchez including namely: that the camera of the vehicle comprises a monocular 
Accordingly, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the teaching of Ros Sanchez by configuring that the camera of the vehicle comprises a monocular camera configured to capture a sequence of images of an environment of the vehicle, and wherein the image comprises a red-green-blue (RGB) image as taught by Pirchheim in order to provide improved tracking and mapping in a reliable manner (Pirchheim: p. 4).

Regarding claim 12, Pirchheim teaches limitations not expressly disclosed by Ros Sanchez including namely: that vector data comprises six Degree of Freedom data pertaining to the camera of the vehicle (see e.g. at least Abstract, p. 3, 27, 32, Fig. 2, 5, and related text).
Accordingly, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the teaching of Ros Sanchez by configuring that vector data comprises six Degree of Freedom data pertaining to the camera of the vehicle as taught by in order to provide improved tracking and mapping in a reliable manner (Pirchheim: p. 4).

Regarding claim 16, Ros Sanchez teaches that the instructions further cause the one or more processors to calculate the training vector data based on the training image, wherein calculating the training vector data comprises:

calculating vector data [intended for successive images of the stereo image sequence using stereo visual odometry] (id.);
wherein the training image provided to the VAE-GAN comprises a single image of a stereo image pair of the stereo image sequence (id.).
Additionally, Pirchheim teaches limitations not expressly disclosed by Ros Sanchez including namely: calculating six Degree of Freedom vector data [intended for successive images of the image sequence using stereo visual odometry] (see e.g. at least p. 3, 5-8, 28).
Accordingly, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the teaching of Ros Sanchez by calculating six Degree of Freedom vector data [intended for successive images of the stereo image sequence using stereo visual odometry] as taught by the combination of Pirchheim and Ros Sanchez in order to provide improved tracking and mapping in a reliable manner (Pirchheim: p. 4).

Regarding claim 18, Ros Sanchez discloses a system for simultaneous localization and mapping of a vehicle in an environment (see e.g. at least Abstract), the system comprising:
a camera of a vehicle (e.g. at least camera 626, see e.g. at least p. 3, 77, Fig. 6, and related text);

receiving an image from the camera of the vehicle (e.g. at least camera 626, see e.g. at least p. 3, 77, Fig. 6, and related text);
providing the image to a variational autoencoder generative adversarial network (VAE-GAN) (e.g. at least VAE-GAN, generator 200, see e.g. at least p. 5-11, 32-33, Fig. 2, and related text);
receiving from the VAE-GAN (see e.g. at least p. 37, 57, 82-83, Fig. 2-3, 5-6, and related text, illustrating the generator 200 outputting a custom image/model to, e.g. at least a second model, a hierarchy/tree of models, a plurality of stacks, a chain, a memory, a database, a person) vector data based on the image (see e.g. at least p. 10-11, 24, 35, 77, 82, Fig. 6, and related text, wherein vector data is constructed from images of an area surrounding the vehicle obtained via camera 626, sensor system 620);
receiving from the VAE-GAN a depth map based on the image (e.g. at least one or more driving scene models, see e.g. at least Abstract, p. 11, 24, 35, 42, 50, 57, 82-83, Fig. 4-5, and related text, creating a map and determining the position and velocity of the vehicle 600, the location of obstacles, objects, or other environmental features, wherein the components of the model are created by object instances to vectors that 
calculating simultaneous localization and mapping [intended for the vehicle based on one or more of vector data and a depth map] (see e.g. at least p. 57, 82-83, Fig. 5-6, and related text, using the data to generate and output one or more driving scene models, including creating a map and determining the position and velocity of the vehicle 600, the location of obstacles, objects, or other environmental features);
wherein the VAE-GAN comprises a latent space [intended for generating each of vector data, a depth map, and images]] (see e.g. at least p. 7, 9-11, 21-24, 26, Fig. 1, 4-5, and related text).
Additionally, Pirchheim teaches limitations not expressly disclosed by Ros Sanchez including namely: a monocular camera of a vehicle (see e.g. at least Abstract).
Accordingly, it would have been obvious to one of ordinary skill in the art at the time of the invention to modify the teaching of Ros Sanchez by configuring a monocular camera of a vehicle; a vehicle controller in communication with the monocular camera, wherein the vehicle controller comprises one or more processors [intended for executing instructions, wherein the instructions comprise: receiving an image from the monocular camera of the vehicle] as taught by the combination of Ros Sanchez and Pirchheim in order to provide improved tracking and mapping in a reliable manner (Pirchheim: p. 4).

Regarding claim 19, Modified Ros Sanchez teaches that the VAE-GAN is trained and training the VAE-GAN comprises:

providing training vector data based on the training image to an encoder of the VAE-GAN, [wherein an encoder is intended to map the training vector data to a compressed latent representation of the training vector data] (Ros Sanchez: see e.g. at least p. 5-11, Fig. 5, and related text); and
providing a training depth map based on the training image to a depth encoder of the VAE-GAN, [wherein the depth encoder is intended to map the training depth map to a compressed latent representation of the training depth map] (Ros Sanchez: id., see also e.g. at least p. 7, 22, 24, 33-35).

Regarding claim 20 Modified Ros Sanchez teaches that the VAE-GAN comprises:
an image encoder [intended to map the image to a compressed latent representation] (Ros Sanchez: see e.g. at least p. 7, 22, 24, 33-35);
a decoder comprising a GAN generator adversarial to a GAN discriminator (Ros Sanchez: see e.g. at least p. 5-11, Fig. 2, and related text);
a depth decoder comprising a GAN generator adversarial to a GAN discriminator (Ros Sanchez: id., see also e.g. at least p. 7, 22, 24, 33-35); and


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES J HAN whose telephone number is (571)270-3980.  The examiner can normally be reached on M-Th and every other F (7:30 AM - 5 PM).
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christian Chace can be reached on 571-272-4190.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHARLES J HAN/Primary Examiner, Art Unit 3662