DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
	
	This action is in response to REMARKS filed 06/17/2022. Claims 1 and 5-20 of US Application No. 16/857,083, filed on 09/24/2021, are currently pending and have been examined. Claims 1, 13, 19, and 20 have been amended and claim 21 is new.

Continued Examination Under 37 CFR 1.114
	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 06/17/2022 has been entered.
 
Response to Arguments
	The Applicant has not addressed the 112f interpretation of claims 17 and 18. Therefore, the 112f interpretation is maintained. 
	
	The Applicant’s arguments with respect to the rejection of claims 1, 5-20 under 35 U.S.C. §103, have been fully considered but are persuasive. Therefore, the previous rejections are withdrawn.

	Applicant’s arguments with respect to the rejection of dependent claims rests on the allowability of the amended independent claims. The examiner finds this argument unpersuasive, for reasons discussed below. 

	With respect to amended claim 1, Applicant argues:

“While Johns uses the term “merge” or “fusing,” there is no teaching or suggestion of using any features in one image to align with any features in another image.”
“Not only does Johns lack the disclosure of any aligning, but Johns never meant to perform any aligning in the first place.”
“Johns is silent about twilight straight lines and twilight urban lights.”

	With respect to the first argument, Applicant further states: “Firstly, in Johns, the two images are identical in their framing. Moreover, a visual inspection of the fused image (an annotated version of which is shown below) shows the opposite. For example, all of the “window lights” in the nighttime image appear completely misaligned with the “windows” in the daytime image.” 

	Here, the Applicant is stating that the framing of the images in Johns are identical indicating no aligning needs to takes place. Applicant also argues that the alignment in the merged image is not correct, which indicates that the framing is not identical. These two arguments contradict each other and the slightly incorrect positions of some of the objects in the merged image support the Examiner’s initial claim that the two images being merged are not identical in their framing. Therefore, the Examiner will continue to interpret Johns as not having images from identical framing. (Further support for non-identical framing is shown in Fig. 3, Fig. 5, Fig. 6, and Fig. 8)

	With respect to the first and second arguments, the Applicant argues that Johns does not perform any aligning, but instead performs “merges” or “fuses” the images. 

	While the Examiner does not agree that the fusing in Johns does not include aligning, to further prosecution the Examiner has made a new rejection below.

	With respect to argument 3, Johns includes identifying and matching features from multiple times per day which includes day, twilight, and nighttime. Therefore, Johns discloses matching twilight lights and features. (See Fig. 8) As stated in the prior office action Bosse discloses using straight line features. Therefore, the combination of Johns and Bosse disclose twilight lights and straight lines
		
Claim Interpretation
	The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

	The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 

As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:

(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 

(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 

(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 

Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

	This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “…processing unit configured to execute…” in claims 17 and 18.

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

For instance, the processing unit will be interpreted in light of pg. 23 ¶20 of the instant specification.

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	Claims 1, 5-9, 11, 12, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Johns et al. (Feature Co-Occurrence Maps: Appearance-based Localisation Throughout the Day, “Johns”) in view of Sharma (A Review: Image Fusion Techniques and Applications, “Sharma”) and in further view of Bosse et al. (Vanishing Points and 3-D Lines from Omnidirectional Video, “Bosse”).

	Regarding claim 1, Johns discloses a method for appearance-based localisation over the course of the day and teaches: 

5generating a map comprising daytime features and nighttime features, wherein a position of nighttime features relative to the daytime features is determined by at least one image captured during twilight. (Matrix M, i.e., a map, is then created by merging co-occurrence statistics, i.e., features, between all images captured in adjacent tours. For example, the statistics from the tour at 4pm, i.e., a twilight time, are merged with the tour at 2pm, i.e., a daytime, to create one set of ”virtual” co-occurrence statistics, and also with the tour at 6pm, i.e., a nighttime, to create another set. Finally, each virtual set of statistics then combined in M to provide statistics representing all illumination conditions throughout the day and night - See at least pg. 3215, §E. Interpolating Between Different Times of Day and Fig. 6 )

the method further comprising: capturing the at least one image during twilight and extracting twilight visual features from the at least one image captured during 10twilight, wherein the twilight visual features (images are captured throughout all times of day, e.g., 2pm, 4pm, 6pm, 7pm, 8pm, and 10pm - See at least Fig 8. The features in these images are extracted and assigned to the closest visual word in the feature space, with all visual words belonging to a single visual dictionary - See at least pg. 3213, §A. Image Quantisation) comprise twilight straight lines and/or twilight urban lights, (the extracted features can include urban lights - See at least Fig. 2-Visual Word D)

[fusing] twilight visual features with daytime features using at least one commonality between the twilight [features] and the daytime features; 15(Feature Co-occurrence Maps (Coco-Map) that exploits the discriminative power of local features, and still allows for location recognition at multiple times of the day, i.e., twilight and daytime. This is achieved by learning the co-occurrence statistics of features in quantised feature space and quantised image space, and fusing together these statistics from different times of the day into one compact model for each location. Query images are then compared to each location model by finding groups of features in the query image that have also co-occurred, with the same spatial relationships, in one of the training images for that location - See at least pg. 3212, §I. Introduction, ¶4; Figure 1(b) shows daytime features, e.g., the tree on the right side of the picture, not available during the nighttime, used to facilitate location estimation)

[fusing] twilight visual features with nighttime features using at least one commonality between the twilight urban lights and the nighttime features; and (the system fuses together image data from multiple times of the day in one compact model. This image data includes street lights, i.e., urban lights, during twilight and nighttime - See at least pg. 3212, §I. Introduction and  Fig. 1)

[fusing] at least one of the twilight visual features with (i) at least one daytime feature that is not present in the twilight image, and/or (ii) at 20least one nighttime feature that is not present in the twilight image. (The appearance of locations between these times is, however, not explicit, but we can estimate these appearances by interpolating between those images we do have. This helps to allow for performance at any time during the day, and not just around the times at which the training tours were captured. We achieve this by merging tours that are adjacent in time, and updating the word-co-occurrence matrix M based on the merged set of features for each location. Consider two training images Xit0 and Xit2 captured at location i at times to and t2 respectively. Due to the smooth nature of natural illumination changes, we can assume that any feature that occurs in both Xit0 and Xit2  is also likely to appear in the hypothetical image Xit1  captured at time t1. A feature that occurs in Xit0 , i.e., a daytime, but not in Xit2 , i.e., a twilight, may also still appear in Xit1, up until a time just before t2. Merging the co-occurrence statistics at t0 and t2 in this way maximizes the recall of feature correspondences at all times between t0 and t2. Matrix M is then created by merging co-occurrence statistics between all images captured in adjacent tours. For example, the statistics from the tour at 4pm are merged with the tour at 2pm to create one set of ”virtual” co-occurrence statistics, and also with the tour at 6pm to create another set. Finally, each virtual set of statistics then combined in M to provide statistics representing all illumination conditions throughout the day and night - See at least pg. 3215, §E. Interpolating between different times of day)

	Johns discloses fusing images together and determining the spatial relationship between common features within the images. Johns does not explicitly state that the fusing process includes aligning the images/features in the image. However, Sharma discloses image fusion techniques and applications and teaches: 

aligning [] visual features with [] features [] (the steps for image fusion are: image input, image registration1, image resampling, image fusion. These steps result in a final fused image - See at least Fig. 1)

	In summary, Johns discloses identifying features from multiple images, determining the spatial relationship between the features in the images, then fusing the images together into one image containing features from the multitude of images. Johns does not explicitly disclose that the process of fusing includes an alignment step. However, Sharma discloses image fusion techniques and applications and teaches that fusion techniques, including feature level fusion, includes the processing step of image registration, i.e., alignment of the images.

	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns to provide for the image registration, as taught in Sharma, to help in sharpening the images, improve geometric corrections, enhance certain features that are not visible in either of the images, replace the defective data, complement the data sets for better decision making. (At Sharma pg. 1082, §I. Introduction)
	
	Johns further teaches that the features may include the edge of buildings (see Fig. 2 Visual Word B). Johns does not explicitly teach the features are straight lines. However, Bosse discloses vanishing points and 3D lines from omnidirectional video and teaches: 

at least one commonality between [] straight lines (the system tracks local, parallel 3D line segments that share a common vanishing point. These segments have six DOFs, which we partition into three groups: two for the direction of the line, two for the perpendicular offset of the line from the origin, and two for the distances of the endpoints along the line. The system updates 3D lines by projecting them into the current view, and comparing them with the corresponding line extracted from the current view, i.e., finding commonality between them - See at least pg. 514, § 2.3 Mapping 3-D Lines)

	In summary, Johns discloses using scene features, some of which will be available only during specific times of day due to different illumination factors, these scene features include at least a section of an edge of a building. Johns further teaches that their system would be applicable to SLAM methods. Johns does not explicitly teach that the features it extracts include straight lines. However, Bosse discloses vanishing point and 3D lines from omnidirectional video and teaches defining landmarks with vanishing points and 3D lines. Bosse further teaches that their system is in the field of SLAM.

 	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns and Sharma to provide for the estimation of the 3D position and orientation of scene landmarks (VPs and 3D lines), as taught in Bosse, to naturally construct a persistent map of scene landmarks to greatly ease the data association problem. (At Bosse pg. 513, §1. Introduction, ¶ 3)

	Regarding claim 5, Johns further teaches:

wherein the map comprising daytime features and nighttime features is generated by adding to a provided map any of: twilight visual features; nighttime features; and daytime features. (Matrix M, i.e., a map, is then created by merging co-occurrence statistics, i.e., features, between all images captured in adjacent tours. For example, the statistics from the tour at 4pm, i.e., a twilight time, are merged with the tour at 2pm, i.e., a daylight time, to create one set of ”virtual” co-occurrence statistics, and also with the tour at 6pm, i.e., a nighttime, to create another set. Finally, each virtual set of statistics then combined in M to provide statistics representing all illumination conditions throughout the day and night - i.e., the features in each of the time periods is added to create a representation of features throughout daytime, twilight, and nighttime - See at least pg. 3215, §Interpolating Between Different Times of Day and Fig. 6)

	Regarding claim 6, Johns further teaches:

wherein visual features related to a location are added to the map by capturing at least one image on the location; (We adopt the Bag-Of-Words model and quantise local features, i.e., visual features, such that each is assigned to the closest visual word in feature space, with all visual words belonging to a single visual dictionary. We then quantise images spatially in a similar manner to, by dividing up images into regular grids of square spatial words, each belonging to a single spatial dictionary - See at least pg. 3213, §A. Image Quantisation and Fig. 2)

extracting visual features from the at least one image; (the visual features are extracted in order to quantise them and to assign them to the closest visual word feature space - See at least pg. 3213, §A. Image Quantisation and Fig. 2)

estimating the location and associating the location to the visual features; and (Given a query image Xq, local features are extracted and quantised, as before. Using an inverted-file index as is typical in image-retrieval tasks, each query visual word points to all database locations which contain that visual word in the word-co-occurrence matrix for that location. Candidate correspondences between features in the query image and features in the database location are then generated for each location - See at least pg. 3214-3215, §C. Location Recognition)

adding the visual features associated with respective location to the map. (Thus far, Mi only represents the appearance of location yi at one instance in time. The ability of our system to deal with illumination changes comes into play as the co-occurrence statistics are updated through further images captured of location yi at different times of the day. Given one of these images, pairs of visual words and their associated spatial words are used to update Mi accordingly. If any observed visual word is not already represented in Mi, it is added along with the corresponding spatial word - See at least pg. 3213-3214 §B. Location Models, ¶4)

	Regarding claim 7, Johns further teaches:

wherein estimation of the location is facilitated by comparing visual features extracted from at least one image captured on the location with visual features comprised by the map used to estimate the location. (Given a query image Xq, i.e., at least one image captured on location, local features are extracted and quantised, as before. Using an inverted-file index as is typical in image-retrieval tasks, each query visual word points to all database locations which contain that visual word in the word-co-occurrence matrix for that location, i.e., visual features comprised by the map used to estimate location Candidate correspondences between features in the query image and features in the database location are then generated for each location - See at least pg. 3214-3215, §C. Location Recognition)

	Regarding claim 8, Johns further teaches:

wherein estimation of the location during daytime is facilitated by daytime features. (Feature Co-occurrence Maps (Coco-Map) that exploits the discriminative power of local features, and still allows for location recognition at multiple times of the day. This is achieved by learning the co-occurrence statistics of features in quantised feature space and quantised image space, and fusing together these statistics from different times of the day into one compact model for each location. Query images are then compared to each location model by finding groups of features in the query image that have also co-occurred, with the same spatial relationships, in one of the training images for that location - See at least pg. 3212, §I. Introduction, ¶4; Figure 1(b) shows daytime features, e.g., the tree on the right side of the picture, not available during the nighttime, used to facilitate location estimation)

	Regarding claim 9, Johns further teaches:

wherein estimation of the location during low light conditions is facilitated by nighttime features. (Feature Co-occurrence Maps (Coco-Map) that exploits the discriminative power of local features, and still allows for location recognition at multiple times of the day. This is achieved by learning the co-occurrence statistics of features in quantised feature space and quantised image space, and fusing together these statistics from different times of the day into one compact model for each location. Query images are then compared to each location model by finding groups of features in the query image that have also co-occurred, with the same spatial relationships, in one of the training images for that location - See at least pg. 3212, §I. Introduction, ¶4; Figure 1(b) shows nighttime features, e.g., the streetlight and illuminated windows, not available during the daytime, used to facilitate location estimation)

	Regarding claim 11, Johns further teaches:

wherein the daytime features comprise a plurality of [features] (Figure 1(b) shows daytime features, e.g., the tree on the right side of the picture, not available during the nighttime, used to facilitate location estimation)

	Johns does not explicitly teach but Bosse further teaches:

[] straight lines (Given a sequence of omni-directional images and detected linear features, our task is to estimate the 3D position and orientation of scene landmarks (VPs and 3D lines), and the pose of the camera as each image was acquired - See at least pg. 513, §2. The Algorithm)

	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns to provide for the estimation of the 3D position and orientation of scene landmarks (VPs and 3D lines), as taught in Bosse, to naturally construct a persistent map of scene landmarks to greatly ease the data association problem. (At Bosse pg. 513, §1. Introduction, ¶ 3)

	Regarding claim 12, Johns further teaches:

wherein the nighttime features comprise urban lights. (Streetlights and illuminated building windows are used during the nighttime to determine location - See at least Fig. 1)

	Regarding claim 19, Johns discloses a method for appearance-based localisation over the course of the day and teaches:

obtaining (i) at least one daytime image captured during daylight and comprising one or more daytime features, (ii) at least one nighttime image 5captured during nighttime and comprising one or more nighttime features, and (iii) at least one twilight image captured during twilight, said at least one twilight image comprising one or more twilight visual features; (The appearance of locations between these times is, however, not explicit, but we can estimate these appearances by interpolating between those images we do have. This helps to allow for performance at any time during the day, and not just around the times at which the training tours were captured. We achieve this by merging tours that are adjacent in time, and updating the word-co-occurrence matrix M based on the merged set of features for each location. Consider two training images Xit0 and Xit2 captured at location i at times to and t2 respectively. Due to the smooth nature of natural illumination changes, we can assume that any feature that occurs in both Xit0 and Xit2  is also likely to appear in the hypothetical image Xit1  captured at time t1. A feature that occurs in Xit0 , i.e., a daytime, but not in Xit2 , i.e., a twilight, may also still appear in Xit1, up until a time just before t2. Merging the co-occurrence statistics at t0 and t2 in this way maximizes the recall of feature correspondences at all times between t0 and t2. Matrix M is then created by merging co-occurrence statistics between all images captured in adjacent tours. For example, the statistics from the tour at 4pm are merged with the tour at 2pm to create one set of ”virtual” co-occurrence statistics, and also with the tour at 6pm to create another set. Finally, each virtual set of statistics then combined in M to provide statistics representing all illumination conditions throughout the day and night - See at least pg. 3215, §E. Interpolating between different times of day)

generating a map comprising at least some of said one or more daytime features and at least some of said one or more nighttime features, 15(Matrix M, i.e., a map, is then created by merging co-occurrence statistics, i.e., features, between all images captured in adjacent tours. For example, the statistics from the tour at 4pm, i.e., a twilight time, are merged with the tour at 2pm, i.e., a daytime, to create one set of ”virtual” co-occurrence statistics, and also with the tour at 6pm, i.e., a nighttime, to create another set. Finally, each virtual set of statistics then combined in M to provide statistics representing all illumination conditions throughout the day and night - See at least pg. 3215, §E. Interpolating Between Different Times of Day and Fig. 6 )

determining a position of a nighttime feature of said one or more nighttime features relative to a daytime feature of said one or more daytime features based on at least some of said one or more twilight visual features (images are captured throughout all times of day, e.g., 2pm, 4pm, 6pm, 7pm, 8pm, and 10pm - See at least Fig 8. The features in these images are extracted and assigned to the closest visual word in the feature space, with all visual words belonging to a single visual dictionary - See at least pg. 3213, §A. Image Quantisation) twilight visual features including twilight straight lines and/or twilight urban lights; (the extracted features can include urban lights - See at least Fig. 2-Visual Word D)

 20[fusing]one or more of the twilight visual features with one or more of the daytime features in the map based on at least one commonality between one or more twilight [features] and the daytime features; and (Feature Co-occurrence Maps (Coco-Map) that exploits the discriminative power of local features, and still allows for location recognition at multiple times of the day, i.e., twilight and daytime. This is achieved by learning the co-occurrence statistics of features in quantised feature space and quantised image space, and fusing together these statistics from different times of the day into one compact model for each location. Query images are then compared to each location model by finding groups of features in the query image that have also co-occurred, with the same spatial relationships, in one of the training images for that location - See at least pg. 3212, §I. Introduction, ¶4; Figure 1(b) shows daytime features, e.g., the tree on the right side of the picture, not available during the nighttime, used to facilitate location estimation)

[fusing] one or more of the twilight visual features with one or more 25nighttime features in the map based on at least one commonality between one or more twilight urban lights and the nighttime features; and (the system fuses together image data from multiple times of the day in one compact model. This image data includes street lights, i.e., urban lights, during twilight and nighttime - See at least pg. 3212, §I. Introduction and  Fig. 1)

determining alignment in the map of: (i) at least one daytime feature and/or (ii) at least one nighttime feature that was not present in the at least one image captured at twilight. (The appearance of locations between these times is, however, not explicit, but we can estimate these appearances by interpolating between those images we do have. This helps to allow for performance at any time during the day, and not just around the times at which the training tours were captured. We achieve this by merging tours that are adjacent in time, and updating the word-co-occurrence matrix M based on the merged set of features for each location. Consider two training images Xit0 and Xit2 captured at location i at times to and t2 respectively. Due to the smooth nature of natural illumination changes, we can assume that any feature that occurs in both Xit0 and Xit2  is also likely to appear in the hypothetical image Xit1  captured at time t1. A feature that occurs in Xit0 , i.e., a daytime, but not in Xit2 , i.e., a twilight, may also still appear in Xit1, up until a time just before t2. Merging the co-occurrence statistics at t0 and t2 in this way maximizes the recall of feature correspondences at all times between t0 and t2. Matrix M is then created by merging co-occurrence statistics between all images captured in adjacent tours. For example, the statistics from the tour at 4pm are merged with the tour at 2pm to create one set of ”virtual” co-occurrence statistics, and also with the tour at 6pm to create another set. Finally, each virtual set of statistics then combined in M to provide statistics representing all illumination conditions throughout the day and night - See at least pg. 3215, §E. Interpolating between different times of day)

	Johns discloses fusing images together and determining the spatial relationship between common features within the images. Johns does not explicitly state that the fusing process includes aligning the images/features in the image. However, Sharma discloses image fusion techniques and applications and teaches: 

aligning [] visual features with [] features [] (the steps for image fusion are: image input, image registration2, image resampling, image fusion. These steps result in a final fused image - See at least Fig. 1)

	In summary, Johns discloses identifying features from multiple images, determining the spatial relationship between the features in the images, then fusing the images together into one image containing features from the multitude of images. Johns does not explicitly disclose that the process of fusing includes an alignment step. However, Sharma discloses image fusion techniques and applications and teaches that fusion techniques, including feature level fusion, includes the processing step of image registration, i.e., alignment of the images.

	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns to provide for the image registration, as taught in Sharma, to help in sharpening the images, improve geometric corrections, enhance certain features that are not visible in either of the images, replace the defective data, complement the data sets for better decision making. (At Sharma pg. 1082, §I. Introduction)

	Johns further teaches that the features may include the edge of buildings (see Fig. 2 Visual Word B). Johns does not explicitly teach the features are straight lines. However, Bosse discloses vanishing points and 3D lines from omnidirectional video and teaches: 

at least one commonality between [] straight lines (the system tracks local, parallel 3D line segments that share a common vanishing point. These segments have six DOFs, which we partition into three groups: two for the direction of the line, two for the perpendicular offset of the line from the origin, and two for the distances of the endpoints along the line. The system updates 3D lines by projecting them into the current view, and comparing them with the corresponding line extracted from the current view, i.e., finding commonality between them - See at least pg. 514, § 2.3 Mapping 3-D Lines)

	In summary, Johns discloses using scene features, some of which will be available only during specific times of day due to different illumination factors, these scene features include at least a section of an edge of a building. Johns further teaches that their system would be applicable to SLAM methods. Johns does not explicitly teach that the features it extracts include straight lines. However, Bosse discloses vanishing point and 3D lines from omnidirectional video and teaches defining landmarks with vanishing points and 3D lines. Bosse further teaches that their system is in the field of SLAM.

 	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns and Sharma to provide for the estimation of the 3D position and orientation of scene landmarks (VPs and 3D lines), as taught in Bosse, to naturally construct a persistent map of scene landmarks to greatly ease the data association problem. (At Bosse pg. 513, §1. Introduction, ¶ 3)

	Regarding claim 20, Johns discloses a method for appearance-based localisation over the course of the day and teaches:

extracting twilight visual features from at least one twilight image captured during twilight, wherein the twilight visual features comprise twilight straight lines and/or twilight urban lights; (the system fuses together image data from multiple times of the day in one compact model. This image data includes street lights, i.e., urban lights, during twilight and nighttime - See at least pg. 3212, §I. Introduction and  Fig. 1)

 extracting daytime features from at least one day image captured 10during the day; (We adopt the Bag-Of-Words model and quantise local features such that each is assigned to the closest visual word in feature space, with all visual words belonging to a single visual dictionary, i.e., extracting features from the image - See at least pg. 3213, §A. Image Quantisation; Quantising features takes place across all times of the day, i.e., daytime - See at least pg. 3218, §V. Conclusions and Fig. 8)

extracting nighttime features from at least one nighttime image captured at night; and (We adopt the Bag-Of-Words model and quantise local features such that each is assigned to the closest visual word in feature space, with all visual words belonging to a single visual dictionary, i.e., extracting features from the image - See at least pg. 3213, §A. Image Quantisation; Quantising features takes place across all times of the day, i.e., nighttime - See at least pg. 3218, §V. Conclusions and Fig. 8)

generating a map comprising at least some of said daytime features and at least some of said nighttime features, said generating comprising: 15(Matrix M, i.e., a map, is then created by merging co-occurrence statistics, i.e., features, between all images captured in adjacent tours. For example, the statistics from the tour at 4pm, i.e., a twilight time, are merged with the tour at 2pm, i.e., a daytime, to create one set of ”virtual” co-occurrence statistics, and also with the tour at 6pm, i.e., a nighttime, to create another set. Finally, each virtual set of statistics then combined in M to provide statistics representing all illumination conditions throughout the day and night - See at least pg. 3215, §E. Interpolating Between Different Times of Day and Fig. 6 )

determining a position in said map of nighttime features relative to the daytime features based on the extracted twilight visual features, (Matrix M, i.e., a map, is then created by merging co-occurrence statistics, i.e., features, between all images captured in adjacent tours. For example, the statistics from the tour at 4pm, i.e., a twilight time, are merged with the tour at 2pm, i.e., a daytime, to create one set of ”virtual” co-occurrence statistics, and also with the tour at 6pm, i.e., a nighttime, to create another set. Finally, each virtual set of statistics then combined in M to provide statistics representing all illumination conditions throughout the day and night - See at least pg. 3215, §E. Interpolating Between Different Times of Day and Fig. 6 )

[fusing] the twilight visual features with daytime features in the map using at least one commonality between the twilight [features] and the daytime features (Feature Co-occurrence Maps (Coco-Map) that exploits the discriminative power of local features, and still allows for location recognition at multiple times of the day, i.e., twilight and daytime. This is achieved by learning the co-occurrence statistics of features in quantised feature space and quantised image space, and fusing together these statistics from different times of the day into one compact model for each location. Query images are then compared to each location model by finding groups of features in the query image that have also co-occurred, with the same spatial relationships, in one of the training images for that location - See at least pg. 3212, §I. Introduction, ¶4; Figure 1(b) shows daytime features, e.g., the tree on the right side of the picture, not available during the nighttime, used to facilitate location estimation)

[fusing] twilight visual features with nighttime features in the map using at least one commonality between the twilight urban lights and the nighttime features (the system fuses together image data from multiple times of the day in one compact model. This image data includes street lights, i.e., urban lights, during twilight and nighttime - See at least pg. 3212, §I. Introduction and  Fig. 1)

determining at least one alignment between a twilight visual feature and a 25feature in the map based on a feature not present in the twilight visual features. (The appearance of locations between these times is, however, not explicit, but we can estimate these appearances by interpolating between those images we do have. This helps to allow for performance at any time during the day, and not just around the times at which the training tours were captured. We achieve this by merging tours that are adjacent in time, and updating the word-co-occurrence matrix M based on the merged set of features for each location. Consider two training images Xit0 and Xit2 captured at location i at times to and t2 respectively. Due to the smooth nature of natural illumination changes, we can assume that any feature that occurs in both Xit0 and Xit2 is also likely to appear in the hypothetical image Xit1  captured at time t1. A feature that occurs in Xit0 , i.e., a daytime, but not in Xit2 , i.e., a twilight, may also still appear in Xit1, up until a time just before t2. Merging the co-occurrence statistics at t0 and t2 in this way maximizes the recall of feature correspondences at all times between t0 and t2. Matrix M is then created by merging co-occurrence statistics between all images captured in adjacent tours. For example, the statistics from the tour at 4pm are merged with the tour at 2pm to create one set of ”virtual” co-occurrence statistics, and also with the tour at 6pm to create another set. Finally, each virtual set of statistics then combined in M to provide statistics representing all illumination conditions throughout the day and night - See at least pg. 3215, §E. Interpolating between different times of day)

	Johns discloses fusing images together and determining the spatial relationship between common features within the images. Johns does not explicitly state that the fusing process includes aligning the images/features in the image. However, Sharma discloses image fusion techniques and applications and teaches: 

aligning [] visual features with [] features [] (the steps for image fusion are: image input, image registration3, image resampling, image fusion. These steps result in a final fused image - See at least Fig. 1)

	In summary, Johns discloses identifying features from multiple images, determining the spatial relationship between the features in the images, then fusing the images together into one image containing features from the multitude of images. Johns does not explicitly disclose that the process of fusing includes an alignment step. However, Sharma discloses image fusion techniques and applications and teaches that fusion techniques, including feature level fusion, includes the processing step of image registration, i.e., alignment of the images.

	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns to provide for the image registration, as taught in Sharma, to help in sharpening the images, improve geometric corrections, enhance certain features that are not visible in either of the images, replace the defective data, complement the data sets for better decision making. (At Sharma pg. 1082, §I. Introduction)

	Johns further teaches that the features may include the edge of buildings (see Fig. 2 Visual Word B). Johns does not explicitly teach the features are straight lines. However, Bosse discloses vanishing points and 3D lines from omnidirectional video and teaches: 

at least one commonality between [] straight lines (the system tracks local, parallel 3D line segments that share a common vanishing point. These segments have six DOFs, which we partition into three groups: two for the direction of the line, two for the perpendicular offset of the line from the origin, and two for the distances of the endpoints along the line. The system updates 3D lines by projecting them into the current view, and comparing them with the corresponding line extracted from the current view, i.e., finding commonality between them - See at least pg. 514, § 2.3 Mapping 3-D Lines)

	In summary, Johns discloses using scene features, some of which will be available only during specific times of day due to different illumination factors, these scene features include at least a section of an edge of a building. Johns further teaches that their system would be applicable to SLAM methods. Johns does not explicitly teach that the features it extracts include straight lines. However, Bosse discloses vanishing point and 3D lines from omnidirectional video and teaches defining landmarks with vanishing points and 3D lines. Bosse further teaches that their system is in the field of SLAM.

 	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns and Sharma to provide for the estimation of the 3D position and orientation of scene landmarks (VPs and 3D lines), as taught in Bosse, to naturally construct a persistent map of scene landmarks to greatly ease the data association problem. (At Bosse pg. 513, §1. Introduction, ¶ 3)

	Regarding claim 21, Johns further teaches:

[] forming the map by then merging aligned features. (Feature Co-occurrence Maps (Coco-Map) that exploits the discriminative power of local features, and still allows for location recognition at multiple times of the day, i.e., twilight and daytime. This is achieved by learning the co-occurrence statistics of features in quantised feature space and quantised image space, and fusing together these statistics from different times of the day into one compact model for each location. Query images are then compared to each location model by finding groups of features in the query image that have also co-occurred, with the same spatial relationships, in one of the training images for that location - See at least pg. 3212, §I. Introduction)

	Johns does not explicitly teach that the merging occurs after aligning the images. However, Sharma further teaches: 

wherein, after the aligning, forming the [image] by then merging aligned features. (the second step of the process of fusion, i.e., merging, is image registration, i.e., aligning, and the fourth step is image fusion. Therefore, the aligning occurs prior to the fusing. - See at least Fig. 1)

	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns and Bosse to provide for the image registration, as taught in Sharma, to help in sharpening the images, improve geometric corrections, enhance certain features that are not visible in either of the images, replace the defective data, complement the data sets for better decision making. (At Sharma pg. 1082, §I. Introduction)

	Claims 10, 13, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Johns in view of Sharma and Bosse, as applied to claim 1, and in further view of Davison et al. (MonoSlam: Real-Time Single Camera SLAM, “Davison”)

	Regarding claim 10, Johns discloses the application of their system to the field of mobile robotics. Mobile robotics are known to contain navigation and visual sensors, e.g., GPS, accelerometers etc. Johns does not explicitly disclose that their system has at least one or any combination of: at least one GPS sensor, at least one dead-reckoning sensor, at least one accelerometer, at least one gyroscope, at least one time of flight camera, at least one Lidar sensor, at least one odometer, at least one magnetometer, and at least one altitude sensor. However, Davison discloses MonoSLAM: Real-Time Single Camera SLAM and teaches: 

wherein the estimation of the location is facilitated by at least one or any combination of: at least one GPS sensor, at least one dead-reckoning sensor, at least one accelerometer, at least one gyroscope, at least one time of flight camera, at least one Lidar sensor, at least one odometer, at 20least one magnetometer, and at least one altitude sensor. (Along with other proprioceptive sensors, HRP-2 is equipped with a 3-axis gyro in the chest which reports measurements of the body’s angular velocity at 200 Hz - See at least pg. 1063, §5.2 Gyro) 

	In summary, Johns discloses applying their system to the field of robotics which generally uses some type of location sensors. Johns does not explicitly teach using the above claimed sensors in their disclosure. Further, Johns does not teach that their system is using SLAM, but does explicitly teach that their system would be applicable to loop closure for SLAM systems, such as in Davison. (See pg.3212, §I. Introduction, ¶1) Davison teaches using a gyro to measure the angular velocity of the robot.

	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns, Sharma, and Bosse to provide for the MonoSLAM, as taught in Davison and suggested by Johns, to naturally construct a persistent map of scene landmarks to be referenced indefinitely in a state-based framework and permit loop closures to correct long-term drift. (At Davison pg. 1053, §1. Introduction, ¶5)

	Regarding claim 13, Johns does not explicitly teach, but Davison further teaches: 

wherein the method is used as a Simultaneous Localization and Mapping (SLAM) method. (MonoSLAM provides real-time SLAM for one of the leading humanoid robot platforms, HRP-2 [52] as it moves around a cluttered indoor workspace - See at least pg. 1063, §5. Results: Humanoid Robot SLAM)

	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns, Sharma, and Bosse to provide for the MonoSLAM, as taught in Davison and suggested by Johns and Bosse, to naturally construct a persistent map of scene landmarks to be referenced indefinitely in a state-based framework and permit loop closures to correct long-term drift. (At Davison pg. 1053, §1. Introduction, ¶5)

	Regarding claim 17, Johns does not explicitly teach, but Davison further teaches: 

A processing unit configured to execute the method of claim 1.  (the system requires a processor, e.g., 1.6 GHz Pentium M processor, to run - See at least pg. 1064, §6.2 Processing Requirements)

	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns, Sharma, and Bosse to provide for the MonoSLAM, as taught in Davison and suggested by Johns and Bosse, to naturally construct a persistent map of scene landmarks to be referenced indefinitely in a state-based framework and permit loop closures to correct long-term drift. (At Davison pg. 1053, §1. Introduction, ¶5)

	Regarding claim 18, Johns does not explicitly teach, but Davison further teaches:

wherein the processing unit is part of a mobile robot and facilitates the mobile robot's navigation and localization. (the MonoSLAM method was used in the HRP-2 humanoid robotic platform. This platform contains a computer system, i.e., a processing unit, to aid its operations, i.e., navigation and localization - See at least pg. 1063, §5. Results: Humanoid Robot SLAM)

	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns, Sharma, and Bosse to provide for the MonoSLAM, as taught in Davison and suggested by Johns and Bosse, to naturally construct a persistent map of scene landmarks to be referenced indefinitely in a state-based framework and permit loop closures to correct long-term drift. (At Davison pg. 1053, §1. Introduction, ¶5)

	Claims 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Johns in view of Sharma and Bosse, as applied to claim 1, and in further view of National Weather Service.

	Regarding claim 14, Johns discloses using images throughout the day, including those during twilight hours, to determine the mapping and location of the vehicle. Johns does not explicitly disclose that twilight is defined by the sun 5being located between 0° and 18° below the horizon, preferably between 0° and 12° below the horizon, such as between 0° and 6° below the horizon. However, the National Weather Service discloses the definitions of twilight and teaches: 

wherein twilight is defined by the sun 5being located between 0° and 18° below the horizon, preferably between 0° and 12° below the horizon, such as between 0° and 6° below the horizon. (civil twilight begins in the morning, or ends in the evening, when the geometric center of the sun is 6 degrees below the horizon - See at least ¶2 and Fig. 1 )

	In summary, Johns discloses using images taken at twilight. Johns does not explicitly teach that twilight is defined as being located between 0° and 18° below the horizon, preferably between 0° and 12° below the horizon, such as between 0° and 6° below the horizon. However, the National Weather Service discloses the definitions of twilight and teaches that definition of twilight includes the claimed degrees below the horizontal.

	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns, Sharma, and Bosse to provide for the definitions of twilight, as taught in National Weather Service, to cover multiple purposes e.g., civil, nautical, or astronomical. 

	Regarding claim 15, Johns does not explicitly teach, but National Weather Service further teaches:

wherein twilight is defined by the sun being located between 0° and 12° below the horizon. (Nautical Twilight begins in the morning, or ends in the evening, when the geometric center of the sun is 12 degrees below the horizon - See at least ¶3 and Fig. 1)

	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns, Sharma, and Bosse to provide for the definitions of twilight, as taught in National Weather Service, to cover multiple purposes e.g., civil, nautical, or astronomical.

	Regarding claim 16, 	Johns does not explicitly teach, but National Weather Service further teaches:

wherein twilight is defined by the sun being located between 0° and 6° below the horizon.(civil twilight begins in the morning, or ends in the evening, when the geometric center of the sun is 6 degrees below the horizon - See at least ¶2 and Fig. 1 )

	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to have modified the method for appearance-based localisation over the course of the day of Johns, Sharma, and Bosse to provide for the definitions of twilight, as taught in National Weather Service, to cover multiple purposes e.g., civil, nautical, or astronomical.

Conclusion
	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Kaneko et al. Humanoid Robot HRP-2 which discloses the computer system used in the HRP-2 Platform of Davison.
	
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHASE L COOLEY whose telephone number is (303)297-4355.  The examiner can normally be reached on Monday-Thursday 7-5MT.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aniss Chad can be reached on 571-270-3832.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.L.C./Examiner, Art Unit 3662       

/ANISS CHAD/Supervisory Patent Examiner, Art Unit 3662                                                                                                                                                                                                                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 Image registration is an image processing technique used to align multiple scenes into a single integrated image. It helps overcome issues such as image rotation, scale, and skew that are common when overlaying images. (emphasis added) https://web.archive.org/web/20131102081148/https://www.mathworks.com/discovery/image-registration.html
        2 Image registration is an image processing technique used to align multiple scenes into a single integrated image. It helps overcome issues such as image rotation, scale, and skew that are common when overlaying images. (emphasis added) https://web.archive.org/web/20131102081148/https://www.mathworks.com/discovery/image-registration.html
        3 Image registration is an image processing technique used to align multiple scenes into a single integrated image. It helps overcome issues such as image rotation, scale, and skew that are common when overlaying images. (emphasis added) https://web.archive.org/web/20131102081148/https://www.mathworks.com/discovery/image-registration.html