Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Claims 1, 3-5, 9, 12, 14, 16, 17, 19 are pending.  Claims 1, 3-12, 14, 16-19 are pending.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 3-12, 14, 16-19  have been considered but are moot because they are directed to the newly amended claims including the amended independent claims that change the scope of the claims as a whole and are open to new grounds of rejection.
35 USC § 112
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “feature extractor configured to select,” “image module configured to calculate,” and “mapper configured to construct.” (Independent Claim 12).
Since the claim limitation(s) invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, Claims 12, 14, 16-18 has/have been interpreted to cover the corresponding structure described in the specification that achieves the claimed function, and equivalents thereof.  
A review of the specification shows that the following appears to be the corresponding structure described in the specification for the 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph limitation: 
See [0024-0030], in reference to Fig. 1-2, wherein a point cloud generator 121 includes a feature extractor, image module, and mapper, and wherein the point cloud generator is implemented as part of a server 125 with hardware controller or as part of a mobile device.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-5, 12, 14, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Raif et al. (US 9430872) in view of Sangster (US 20100156834).
Re claim 1, Raif teaches a method for three-dimensional point cloud generation (see col 1, lines 15-20, point cloud is a set of points in a 3D coordinate system).
the method comprising:
identifying a plurality of images corresponding to a geographic area (see claim 1, receiving a first plurality of passive images, each of the passive images…corresponding to a geographic location).
and including image descriptors (see col 3, lines 22-47, wherein metadata and imagery itself is used for predicting point cloud quality, metrics derived from the imagery such as cloud cover percentages, intensity variance, image-to-image correlation and covariance).
wherein two or more of the plurality of images adaptable into a spatial relationship based on positional information associated with the plurality of images (see claim 1, receiving a first plurality of passive images, each of the passive images…corresponding to a geographic location…obtain a second plurality of passive images, the second plurality of passive images being a subset of the first plurality of passive images).  Hence, the plurality of passive images are in a spatial relationship based on positional information (spatially corresponding to a geographic location/position) and (see col 6, lines 22-32, in reference to Fig. 3, item 142, point cloud parameters, wherein parameters include the latitude, longitude, and radius of desired point cloud).
selecting a subset of neighboring images from the plurality of images using a pairing factor 
(see col 6, lines 22-54, wherein at 144, images downloaded at 140 are chipped and only those images that fully cover the view under review are kept), (see col 4, line 39-51, “When calculating a tiepoint via cross correlation, a candidate pixel location from a reference image is chosen, and a small region around that candidate pixel location is chipped from the "reference" image (called a template window). Then, the template window "slides" over a search region in the "reference complement" image, generating a correlation score at each candidate location. The correlation scores comprise a 2D correlation surface, where the highest score can be found at the location where the reference candidate pixel location best matches a location in the reference complement image. This pairing of a pixel location in the reference image and a pixel location in the reference complement image is a tiepoint), and (see col 6, lines 33-38, “At 144, images downloaded at 140 are chipped and only those images that fully cover the view under review are kept. The images that fully cover the view under review then become the set of imagery for performance prediction processing. In addition, at 144, metadata is extracted from each image)
calculating, using a processor, point matches within the subset of neighboring images based on the image descriptors, (in reference to Fig. 3, see col 6, lines 39-46: At 146, the images are clustered and a performance prediction is made for each cluster. In some embodiments, a first pass is made based on a metadata-based performance prediction and a subset of image clusters is identified. That subset of image clusters then receives the more computationally intensive correlator-based performance evaluation. One or more image clusters is identified and passed to 148 to be used to build a point cloud), (see col 7, lines 58-col 8, line 11: “a method for selecting a subset of images to use in building a point cloud…five images selected from available and appropriate images…list of all possible combinations of the five images selected from available and appropriate images…metadata-based performance score…Next, we compute the correlator-based performance score. As noted above, this calculation is computationally intensive so, in the example embodiment shown in FIG. 6…) and (see col 4, line 39-60, wherein correlation scores are generated at tiepoints, pairing of a pixel location in the reference image and a pixel location in the reference complement image).
and constructing, using the processor, a three-dimensional point cloud for at least a portion of the geographic area, from the point matches and the relationship, using the image descriptors from the set of neighboring images (in reference to Fig. 3, see col 6, lines 39-46: At 146, the images are clustered and a performance prediction is made for each cluster. In some embodiments, a first pass is made based on a metadata-based performance prediction and a subset of image clusters is identified. That subset of image clusters then receives the more computationally intensive correlator-based performance evaluation. One or more image clusters is identified and passed to 148 to be used to build a point cloud), (see col 6, lines 18-54, in reference to Fig. 3, wherein at 148 a point cloud is built from image preparation 144 including relationships using image descriptors, and 146, performance predictions using point matches from the set of neighboring images in image directory 140 with geographic area information such as point cloud parameters like latitude, longitude, and radius of desired point cloud), and (see claim 1, generating point clouds from passive images…corresponding to geographic location).
Raif does not explicitly teach wherein the pairing factor includes a temporal factor such that the subset of neighboring images include neighbors in time having timestamps within a predetermined range and a spatial factor such that the subset of neighboring images include neighbors in geometric space based on the positional information.
However, Sangster teaches selecting a subset of neighboring images from a plurality of images (see [0097], grouping into a smaller subset of images compared to an original collection)
using a pairing factor including a temporal factor such that the subset of neighboring images include neighbors in time having timestamps within a predetermined range (see [0105] & [0109], in reference to Fig. 7, wherein temporal classification of images can be carried out based on the range of timestamps of images as shown in Fig. 7, item 705) and (see [0112], wherein relevant images from a collection of images is selected and arranged in groups based temporal events…If the image includes timestamps, this is used as factors to group the images).
and a spatial factor such that the subset of neighboring images include neighbors in geometric space based on the positional information (see [0105] & [0107], in reference to Fig. 7, item 706, wherein for each image an entry of the geographic coordinates related to the image is shown) and (see [0112], wherein relevant images from a collection of images is selected and arranged in groups based in spatial factors such as geographic location…If the image includes geotag metadata, that is used as factors to group the images based on position information making them neighbors in geometric space).
Raif and Sangster teaches 1.  It would have been obvious to one of ordinary skill in the art at the time of filing to modify Raif’s selecting a subset of neighboring images from a plurality of images using a pairing factor to explicitly include pairing factors that include a temporal factor and spatial factor, as taught by Sangster, as the references are pertinent to the problem of selecting a subset of images based on pairing factors, and are in the analogous art of image processing.  An advantage of the modification is that it achieves the result of using temporal and spatial factors to group a subset of a plurality of images, and this creating a better collection of relevant images for further processing (see [0005] [0114], selecting and grouping a subset of images that are more targeted to be of interest to the user for further processing).
Re claim 3, Raif and Sangster teaches claim 1.  Furthermore, Sangster teaches 
Wherein the predetermine range is a time range defining an amount of time between the timestamps of the subset of neighboring images (see [0105] & [0109], in reference to Fig. 7, wherein temporal classification of images can be carried out based on the range of timestamps of images as shown in Fig. 7, item 705, and see item 707 grouping images by event (based on shooting date as an example of a predetermined time range) and (see [0112], wherein relevant images from a collection of images is selected and arranged in groups based temporal events…If the image includes timestamps, this is used as factors to group the images).  For motivation, see claim 1.
	Re claim 4, Raif and Sangster teaches claim 1.  Furthermore, Sangster teaches wherein the positional information includes position coordinates and/or heading values (see Fig. 7, item 706, wherein geotag includes coordinate positions).  For motivation, see claim 1.
Re claim 5, Raif and Sangster teaches claim 1.  Furthermore, Raif teaches wherein selecting a set of neighboring images from the plurality of images using a pairing factor further comprises:
identifying an initial image; performing comparisons of other images in the plurality of images to the initial image using the pairing factor; and identifying the set of neighboring images in response to the comparison (see col 4, line 39-51, “When calculating a tiepoint via cross correlation, a candidate pixel location from a reference image is chosen, and a small region around that candidate pixel location is chipped from the "reference" image (called a template window). Then, the template window "slides" over a search region in the "reference complement" image, generating a correlation score at each candidate location. The correlation scores comprise a 2D correlation surface, where the highest score can be found at the location where the reference candidate pixel location best matches a location in the reference complement image. This pairing of a pixel location in the reference image and a pixel location in the reference complement image is a tiepoint), and (see col 6, lines 33-38, “At 144, images downloaded at 140 are chipped and only those images that fully cover the view under review are kept. The images that fully cover the view under review then become the set of imagery for performance prediction processing. In addition, at 144, metadata is extracted from each image).
Claim 12 claims limitations in scope to claim 1 as an apparatus comprising memory and processing function units.  Raif and Sangster teaches claim 1, herein incorporated by reference.  Furthermore, Raif teaches a processor with memory (see col 11, lines 4-19, processor and memory storage).
	Claim 14 claims limitations in scope to claim 3 and is rejected for at least the reasons above.
Claim 16 claims limitations in scope to claim 5 and is rejected for at least the reasons above.

Claims 6  is/are rejected under 35 U.S.C. 103 as being unpatentable over Raif et al. (US 9430872) in view of Sangster (US 20100156834) and Mundy et al. (US 20210174580).

Re claim 6, Raif and Sangster teaches claim 1.  Raif and Sangster do not explicitly teach receiving the plurality of images from a plurality of types of sources.  However, Mundy teaches receiving the plurality of images from a plurality of types of sources (see [0004], processes digital images, regardless of the source) and (see [0040], any type of image captured by any type of image capture source can be used including aerial images and ground images).
Raif, Sangster, and Mundy teaches claim 6.  It would have been obvious to one of ordinary skill in the art at the time of filing to modify Raif and Sangster’s imaging system using a plurality of images to explicitly include a plurality of image sources, as taught by Mundy, as the references are in the analogous art deriving point clouds with image data.  An advantage of the modification is that it achieves the result of processing a plurality of images regardless of the image source, as taught by Mundy.
Claims 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Raif et al. (US 9430872) in view of Sangster (US 20100156834) and Gallaway et al. (US 20210019937).
Re claim 7, Raif and Sangster teaches claim 1.  Furthermore, Sangster teaches wherein positional information includes geographical coordinates (see Fig. 7, item 706, wherein geotag includes coordinate positions).  For motivation, see claim 1.
	Raif and Sangster does not explicitly teach at least one angle.
However, Gallaway teaches wherein positional information includes geographic coordinates and at least one angle (see [0013], receiving image data for the scene from a plurality of sensors located at different angles with respect to the geographic scene), and ([0025], multiple images of a scene including different viewpoint angles), and (see [0004], image coordinates).
Raif, Sangster, and Gallaway teaches claim 7.  It would have been obvious to one of ordinary skill in the art at the time of filing to modify Raif and Sangster’s imaging system to explicit include spatial relationships based on positional information associated with a plurality of images, as taught by Gallaway, as the references are in the analogous art of grouping images into clusters for point cloud generation.  An advantage of the modification is that it achieves the result of explicitly using spatial relationships of a plurality of images for image processing, such as constructing point clouds from spatial image data.
Claims  8-9 , 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Raif et al. (US 9430872) in view of Sangster (US 20100156834) and Nehmadi et al. (US 20180232947).
	Re claim 8, Raif and Sangster teaches claim 1.  Raif and Sangster do not explicitly teach wherein the positional information includes, at least in part, light detection and ranging (LIDAR) data.
However, Nehmadi teaches wherein the positional information includes, at least in part, light detection and ranging (LIDAR) data ([0009] LiDAR is similar to radar, but uses a laser light beam to create a virtual image instead of radio waves. The light beam is sent as a pulse in a specific direction toward a scene and a receiver detects reflections from the light beam pulse, which are then used to produce a three-dimensional virtual image. The distance accuracy of a LiDAR is excellent, and its range is limited only by the available peak power of the source laser. As the wavelengths used for LiDAR are much shorter than radar waves the angular resolution of LiDAR beam is typically an order of magnitude better than that of a radar), (see [0051], distance measurement computed with LiDAR), and ([0064] FIG. 4 is a flowchart illustrating a method for generating a 3D map according to an embodiment. The method is an iterative process, where the inputs streams of high-density images from passive sensors and low-density distance measurements from active sensors are processed to generate an output stream of joint high-density 3D map and high density 3D images. The process starts at S410 by acquiring a passive image, such as an image created by the camera optical sensors 278 and 280 of FIG. 2B. Next, at S420, an active distance data set (ADS) is acquired from an active sensor, such as the LiDAR sensors 282, 284 of FIG. 2B. The ADS may include the distance and speed information each point along a grid imposed on an image. Additionally, the ADS may include distance and speed information regarding various ROls identified within the image, e.g., vehicles or pedestrians). 
Raif, Sangster, and Nehmadi teaches claim 8.  It would have been obvious to one of ordinary skill in the art of the time of filing to modify Raif and Sangster’s grouping acquired images with positional data to explicitly include the use of LiDar, as taught by Nehmadi, as the references are in the analogous art of aligning a plurality of images.  An advantage of the modification is that it achieves the result of explicitly using different sensors, such as LiDAR to capture image positional data.
Re claim 9, Raif, Sangster, and Mehmadi teaches claim 8.  Furthermore, Nehmadi teaches wherein the set of neighboring images are selected in response to the LIDAR data ([0064] FIG. 4 is a flowchart illustrating a method for generating a 3D map according to an embodiment. The method is an iterative process, where the inputs streams of high-density images from passive sensors and low-density distance measurements from active sensors are processed to generate an output stream of joint high-density 3D map and high density 3D images. The process starts at S410 by acquiring a passive image, such as an image created by the camera optical sensors 278 and 280 of FIG. 2B. Next, at S420, an active distance data set (ADS) is acquired from an active sensor, such as the LiDAR sensors 282, 284 of FIG. 2B. The ADS may include the distance and speed information each point along a grid imposed on an image. Additionally, the ADS may include distance and speed information regarding various ROls identified within the image, e.g., vehicles or pedestrians) and (see [0084], wherein LiDAR used to accurately analyze scene).  For motivation, see claim 8.
Claim 17 claims limitations in scope to claim 9 and is rejected for at least the reasons above.

Claim 10, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Raif et al. (US 9430872) in view of Sangster (US 20100156834) and Bogan et al. (US 20200082566).
Re claim 10, Raif and Sangster teaches claim 1.  Raif and Sangster do not explicitly teach calculating a three-dimensional position for a probe using the three-dimensional point cloud.
However, Bogan teaches calculating a three-dimensional position for a probe using the three-dimensional point cloud ([0003] One task often performed by machine vision systems is to attempt to search for and identify the location and orientation of a pattern of interest within images. Some techniques use a model to represent the pattern of interest, which can include a plurality of probes. Each probe is a point of interest and associated data (e.g., a location and a vector). Each probe can be used to determine, for example, the measure of the similarity of a run-time image feature or region to a pattern feature or region at a specific location. The plurality of probes can be applied at a plurality of poses to the run-time image, and the information from the probes at each pose can be used to determine the most likely poses of the pattern in the run-time image) and ([0086] Referring to steps 206 through 212, as noted above, the method 200 can be used to perform a coarse phase of a 3D model alignment search in the 3D image. In some embodiments, the method 200 can search for an approximate pose of the 3D model in the field that can be further refined by subsequent steps. The approximate pose can include, for example, a 3D position that includes the (x, y, z) location as well as orientation data, such as roll, pitch, and/or yaw. Referring to step 206, in some embodiments the testing includes testing a set of probes of the 3D model to the field. For example, the machine vision system can test a set of probes of the model to the field to determine the score by summing the dot product of each probe and an associated vector in the field. In some embodiments, as discussed further in conjunction with FIG. 9, the score can be based on multiple values (e.g., multiple fields), such as crease edge information, occlusion boundary information, color information, intensity information, and/or the like), and ([0112] FIG. 9 shows an exemplary method 900 for performing an initial search for a pose of a three-dimensional model, according to some embodiments. At step 902, the machine vision system stores a three-dimensional model that includes a set of probes. At step 904, the system receives 3D data (e.g., a point cloud, a depth image, etc.) of an object that includes a set of 3D data entries. At step 906, the system converts the three-dimensional data into a set of two or more fields. A first field includes a first set of values that are each indicative of a first characteristic of an associated data entry or a plurality of data entries from the 3D data entries. A second field includes a second set of values that are each indicative of a second characteristic of an associated data entry or a plurality of data entries from the 3D data entries. At step 908, the system tests a pose of the 3D model with the set of fields, including testing the set of probes to the set of fields, to determine a score for the pose. At step 910 the system determines whether the score meets a predetermined threshold. If the score meets the threshold, the method proceeds to step 912 and stores the pose (e.g., for subsequent refinement). If the score does not meet the threshold, the method proceeds to step 914 and determines whether there are further poses to test, and if so, the method proceeds back to step 908, otherwise the method ends at step 916), and (see [0070], “…In the 3D context, for example, the image can be a range image, a point cloud, and/or the like. As a general matter, and as discussed further herein, the machine vision system can train a 3D model of an object that includes a set of probes. The machine vision system can use the 3D model to search for the pose of the 3D model in runtime 3D images. As discussed herein, the machine vision system can process the 3D images to generate one or more 3D fields to facilitate searching for the pose of the 3D model. The field can include a set of vectors. The Cartesian components of the vectors (x, y, z) can be stored, which can imply the magnitude and direction, the field can store the actual the actual (r, lat, long) values, and/or the like).  Bogan teaches calculating 3d position for a probe (x, y, z), using the 3D point cloud (processing image data such as point cloud).
Raif, Sangster, and Bogan teaches claim 10.  It would have been obvious to one of ordinary skill in the art at the time of filing to modify Raif’s and Sangster’s image analysis system to explicitly include using a probe with 3D position information, as taught by Bogan, as the references are in the analogous art of image analysis systems using point cloud data.  An advantage of the modification is that it achieves the result of using probes to acquire image position data.
Claim 18 claims limitations in scope to claim 10 and is rejected for at least the reasons above.
Claim 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Raif et al. (US 9430872) in view of Sangster (US 20100156834)  and Verma et al. (US 20060061566).
Re claim 11, Raif and Sangster teaches claim 1.  Furthermore, Sangster teaches receiving sensor data and (see [0040] & [0043], camera 127, as a sensor that takes photographs) and (see [0106] geo-tag metadata, such as capture location data from the sensor).  For motivation, see claim 1.
Raif and Sangster do not explicitly teach overlaying one or more objects on an output image using the 3D point cloud and the output image.
However, Verma teaches overlaying one or more objects on an output image using the 3D point cloud and the output image ([0027] FIG. 9 depicts a functional block diagram of the modeling software 114. The LIDAR point cloud (or a portion thereof) is applied to a three-dimensional geometry processing module 902 that creates a geometric model directly from the point cloud. A solid model composition module 904 applies solid modeling processes to remove "inner" attributes and structures of the modeled objects within the scene, i.e., objects such as buildings are modeled as geometric shapes with planar sides, roofs, and the like. The interactive model editing module 906 allows a user to edit the model to fit the point cloud data and/or remove any anomalous attributes of the model. In one embodiment, the module 906 overlays the model as a translucent image upon the original point cloud and a user may alter the model to better fit the point cloud. The output of the module 906 is an untextured geometric model of the scene).  Verma teaches overlaying one or more objects on an output image using the 3D point cloud and the output image (objects modeled as geometric shapes overlay as a translucent output image upon the point cloud).
Raif, Sangster, and Verma teaches claim 11.  It would have been obvious to one of ordinary skill in the art at the time of filing to modify Raif and Sangster’s image analysis system using point clouds to explicitly include overlaying objects on an output image, as taught by Verma, as the references are in the analogous art of point cloud based computer vision systems.  An advantage of the modification is that it achieves the result of using the point cloud for 3D modeling.
Claim 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Raif et al. (US 9430872) in view of Sangster (US 20100156834) and Trenholm et al. (US 20190138786).
Re claim 19, Raif teaches a non-transitory computer readable medium including instructions that when executed are configured to perform (see col 11, lines 4-19, non-transitory storage device).
identifying a plurality of images corresponding to a geographic area (see claim 1, receiving a first plurality of passive images, each of the passive images…corresponding to a geographic location).
Wherein the plurality of images include including image descriptors adaptable into a spatial relationship based on positional information associated with the plurality of images (see col 3, lines 22-47, wherein metadata and imagery itself is used for predicting point cloud quality, metrics derived from the imagery such as cloud cover percentages, intensity variance, image-to-image correlation and covariance), (see claim 1, receiving a first plurality of passive images, each of the passive images…corresponding to a geographic location…obtain a second plurality of passive images, the second plurality of passive images being a subset of the first plurality of passive images).  Hence, the plurality of passive images are in a spatial relationship based on positional information (corresponding to a geographic location/position) and (see col 6, lines 22-32, in reference to Fig. 3, item 142, point cloud parameters, wherein parameters include the latitude, longitude, and radius of desired point cloud).
selecting a set of neighboring images from the plurality of images using a pairing factor 
(see col 6, lines 22-54, wherein at 144, images downloaded at 140 are chipped and only those images that fully cover the view under review are kept), (see col 4, line 39-51, “When calculating a tiepoint via cross correlation, a candidate pixel location from a reference image is chosen, and a small region around that candidate pixel location is chipped from the "reference" image (called a template window). Then, the template window "slides" over a search region in the "reference complement" image, generating a correlation score at each candidate location. The correlation scores comprise a 2D correlation surface, where the highest score can be found at the location where the reference candidate pixel location best matches a location in the reference complement image. This pairing of a pixel location in the reference image and a pixel location in the reference complement image is a tiepoint), and (see col 6, lines 33-38, “At 144, images downloaded at 140 are chipped and only those images that fully cover the view under review are kept. The images that fully cover the view under review then become the set of imagery for performance prediction processing. In addition, at 144, metadata is extracted from each image)
calculating point matches based on the set of neighboring images and the positional information, (in reference to Fig. 3, see col 6, lines 39-46: At 146, the images are clustered and a performance prediction is made for each cluster. In some embodiments, a first pass is made based on a metadata-based performance prediction and a subset of image clusters is identified. That subset of image clusters then receives the more computationally intensive correlator-based performance evaluation. One or more image clusters is identified and passed to 148 to be used to build a point cloud), (see col 7, lines 58-col 8, line 11: “a method for selecting a subset of images to use in building a point cloud…five images selected from available and appropriate images…list of all possible combinations of the five images selected from available and appropriate images…metadata-based performance score…Next, we compute the correlator-based performance score. As noted above, this calculation is computationally intensive so, in the example embodiment shown in FIG. 6…), (see claim 1, receiving a first plurality of passive images, each of the passive images…corresponding to a geographic location), and (see col 4, line 39-60, wherein correlation scores are generated at tiepoints, pairing of a pixel location in the reference image and a pixel location in the reference complement image).
  Raif teaches calculating point matches (pairing of pixel points of images and scoring) based on the set of neighboring images (subset of images selected) and the positional information (images correspond to a geographic location and are thus based on being positioned in the graphical location).
and constructing, using the processor, a three-dimensional point cloud for at least a portion of the geographic area, from the point matches, using the image descriptors from the set of neighboring images (in reference to Fig. 3, see col 6, lines 39-46: At 146, the images are clustered and a performance prediction is made for each cluster. In some embodiments, a first pass is made based on a metadata-based performance prediction and a subset of image clusters is identified. That subset of image clusters then receives the more computationally intensive correlator-based performance evaluation. One or more image clusters is identified and passed to 148 to be used to build a point cloud), (see col 6, lines 18-54, in reference to Fig. 3, wherein at 148 a point cloud is built from image preparation 144 including relationships using image descriptors, and 146, performance predictions using point matches from the set of neighboring images in image directory 140 with geographic area information such as point cloud parameters like latitude, longitude, and radius of desired point cloud), and (see claim 1, generating point clouds from passive images…corresponding to geographic location).
Raif does not explicitly teach the pairing factor includes a time component to limit the set of neighboring images and a position component to limit the set of neighboring images.
However, Sangster teaches selecting a set of neighboring images from the plurality of images using a pairing factor including a time component to limit the set of neighboring images and a position component to limit the set of neighboring images
 (see [0097], grouping into a smaller subset of images compared to an original collection)
 (see [0105] & [0109], in reference to Fig. 7, wherein temporal classification of images can be carried out based on the range of timestamps of images as shown in Fig. 7, item 705) and (see [0112], wherein relevant images from a collection of images is selected and arranged in groups based temporal events…If the image includes timestamps, this is used as factors to group the images) and  (see [0105] & [0107], in reference to Fig. 7, item 706, wherein for each image an entry of the geographic coordinates related to the image is shown) and (see [0112], wherein relevant images from a collection of images is selected and arranged in groups based in spatial factors such as geographic location…If the image includes geotag metadata, that is used as factors to group the images based on position information making them neighbors in geometric space) and (see [0005] [0114], selecting and grouping a subset of images that are more targeted to be of interest to the user for further processing, and thus limiting the set of images).  For motivation, see claim 1.
Raif and Sangster do not explicitly teach receiving sensor data from a mobile device and calculating a position based on the sensor data and the 3D point cloud.
However, Trenholm teaches receiving sensor data from a mobile device (see [0047], image device such as a mobile device).
	Calculating a position based on the sensor data and the 3D point cloud ([0061] Turning first to FIG. 4, the steps of pre-processing 301 and feature detection/extraction 302 are shown in greater detail. The steps shown in FIG. 4 correspond to an embodiment wherein image device 102 is a video camera; however, with appropriate modifications image device 102 could be any suitable imaging device. The video camera 102 acquires a video sequence of a scene, the scene containing an object of interest, and the video sequence is provided to system 100. At block 401, the video sequence is decomposed into a plurality of images. At block 402, metadata such as EXIF data/tags are inserted into the plurality of images. Metadata may include resolution, lens information, focal length, sensor width, or the like. Metadata/EXIF data can be used to extrapolate camera calibration in order to perform a 3D spatial alignment procedure, which determines the relative location of the image device 102 in the 3D point cloud being reconstructed and estimates the camera poses (i.e. the position and orientation of the image device 102 when an image was acquired). At block 403, feature detection and description is performed on each image to extract features. Feature detection can be performed by examining key points in an image. Key points represent location in an image that a surrounded by distinctive texture; key points are preferably stably defined in the images, scalable, and reproducible under varying imaging conditions. Key points may be selected that have high repeatability across multiple images in an image set due to invariance to changes in illumination, image noise, geometric transformation such as scaling, translation, shearing, and rotation. Feature detection may utilize binary descriptors; for example, binary robust independent elementary features (BRIEF), binary robust invariant scalable keypoints (BRISK), oriented fast and rotated BRIEF (ORB), Accelerated KAZE (AKAZE), fast retina keypoint (FREAK), or other techniques. At block 404, the features can be mapped across the image set in order to reconstruct the 3D image. One or more feature descriptor techniques may be used to determine which key points in various images in the image set are 2D representations of the same 3D point; for example, gradient location and orientation histogram (GLOH), speeded-up robust features (SURF), scale-invariant feature transform (SIFT), histogram of oriented gradients (HOG), or the like. This can be done by computing a feature vector or feature descriptor with local characteristics to describe a local patch. Feature descriptors are matched between different images in the image set by associating key points from one image to another in the set. At block 405, once all the feature descriptors are matched, a global map of feature visibility among views can be created) and ([0062] Referring now to FIG. 5, shown therein are the steps of optimization 303 and increasing point cloud density 304 of method 300 in greater detail, in accordance with an embodiment. At block 501, a non-linear optimization step called "bundle adjustment" is performed on the image set to jointly refine relative poses of the image devices 102 and the 3D position of points. This step provides information regarding (i) the location of the image devices 102 and their orientation in a local reference frame, which may be determined with respect to a reference image device, and (ii) where a given image device 102 is with respect to the 3D object--i.e. what was the location and orientation of that particular imaging device to create that particular 2D image on which the features were detected (at step 302). Bundle adjustment may be performed incrementally or globally to minimize projection errors of points onto an image given an estimated pose, with respect to camera matrices and key points. From this, a sparse point cloud is produced at block 502. Upon bundle adjustment, knowing the orientation of the images, a texture-mapped dense 3D point cloud is created at block 503. The point density may also be increased by interpolating the sparse point cloud. At block 504, an even denser point cloud may be constructed via patch match or other similar technique, whereby regions are matched around features. This may be done using normalized cross correlation or other advanced method(s). Patches around different keypoints are compared across multiple views based on similarity. If the similarity of the patches from multiple views is within a threshold, it can be added to the point cloud to fill it out. These patches propagate to the point cloud and soon the sparse structure becomes dense enough to be able to make out what is in the scene. In some scenarios, it may not be possible to handle a large number of images for a global 3D scene reconstruction due to limitation on computational or memory resources. In such a scenario, the input images may be decomposed into set of image clusters of manageable size based on their proximity to other camera views, as determined by the bundle adjustment process. A dense point cloud can then be generated for each cluster independently and in parallel. At block 505, multiview clustering is performed, wherein the union of reconstructions from all the clusters recreates the entire 3D image).
Trenholm teaches using a mobile device to receive sensor data such as orientation and position and the use of a sparse point cloud to generate dense point cloud from the position data.
Raif, Sangster, and Trenholm teaches claim 19.  It would have been obvious to one of ordinary skill in the art at the time of filing to modify Raif and Sangster’s image processing system to explicitly include the use of a mobile device, as taught by Trenholm, as the references are in the analogous art of point cloud generation from captured sensor image data.  An advantage of the modification is that it achieves the result of using different sensor types such as mobile devices for capturing image data for image processing and cloud point generation/utilization.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Peter Hoang whose telephone number is (571)270-1346. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PETER HOANG/             Primary Examiner, Art Unit 2616