Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action
1.	The Examiner acknowledges the applicant’s amendment filed June 10, 2021.  At this point claims 1-4, 7-14, 19-20 and 23 are pending in the instant application and ready for examination by the Examiner.

2.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on June 10, 2021 has been entered.

Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Claim(s) 1-4, 7, 10, 12-14 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Park in view of Syeda in view of Se in view of Savvides in view of Meier and further in view of Venkataraman. (U. S. Patent Publication 20140222425, referred to as Park; U. S. Patent 6578040, referred to as Syeda; U. S. Patent Publication 20060221072, referred to as Se; U. S. Patent Publication 20140226855, referred to as Savvides; U. S. Patent Publication 20130177235, referred to as Meier; U. S. Patent Publication 20140333787, referred to as Venkataraman.)

Claim 1
Park discloses a method comprising: receiving multi-modal sensor data from a plurality of sensors having different modalities comprising an imaging sensor configured to capture a video input (Park, 0057; …As illustrated in FIG. 4, in the case of multi-modal deep learning based on acoustic signal (audio input) and image signals ( video input),…) and a ranging sensor configured to capture a depth input (Park, 0040; In the case where the 3D information acquisition unit 122 includes the 3D geometric information estimation unit, the 3D geometric information estimation unit may be configured to include one or more of existing various range sensors and depth sensors;….)…. applying a model comprising a deep convolutional neural network. Park, 0055; Hereinafter, a process of acquiring integrated feature of acoustic signal and image by using a multi-modal deep learning scheme in the aforementioned speech recognition learning methods according to the embodiment will be described in detail. FIG. 4 is a schematic diagram for explaining deep learning. The aforementioned speech recognition learning methods are to perform speech recognition learning by using deep learning through one of or a combination of DNN ( Deep Neural Network), DBN ( Deep Belief Network), and DCN ( Deep Convolutional Network).) 
Park does not disclose expressly deriving a plurality of optical flow data defining a change in scene dynamics between at least two frames of the video input based on a motion field change.
Syeda discloses deriving a plurality of optical flow data defining a change in scene dynamics between at least two frames of the video input based on a motion field change. (Syeda, c9:8-20; As shown in FIG. 3, the invention identifies likely matches of electronic slide images and video stream images by detecting differences in video stream images using a simple color histogram, 30;…) It would have been obvious to one having ordinary skill in the art, having the teachings of Park and Syeda before him before the effective filing date of the claimed invention, to modify Park to incorporate a method for a specific method of changes within a video input of Syeda. Given the advantage of a faster rate of change indicates a closer indication of an object , one having ordinary skill in the art would have been motivated to make this obvious modification. 
Park and Syeda do not disclose expressly fusing multi-modal sensor data from the plurality of sensors having different modalities and the plurality of optical flow 
Se discloses fusing multi-modal sensor data from the plurality of sensors having different modalities and the plurality of optical flow data (Se, 0006; One sensor (especially with fixed optics and at one stand-off distance) cannot provide the resolution and depth of field required for the whole range of operations, e.g., room model and blood spatter analysis. Data from multi-modal sensors has to be fused together, e.g., room model and close-up images of fingerprints.);…. outputting a perception indication of the post-processing result to a decision making process for autonomous navigation of a vehicle. (Se, 0025; The system can be used for visual odometry, for automatically computing relative position and orientation using sequences of images for autonomous vehicle navigation.) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda and Se before him before the effective filing date of the claimed invention, to modify Park and Syeda to incorporate fusing data for navigational purposes of Se. Given the advantage of all incoming data must be synced (fused) to obtain a current position determination, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Park, Syeda and Se do not disclose expressly detecting at least one region of interest in the multi-modal sensor data; detecting one or more patches of interest in the multi-modal sensor data based on detecting the at least one region of interest.
Savvides discloses detecting at least one region of interest in the multi-modal sensor data; detecting one or more patches of interest in the multi-modal sensor data based on detecting the at least one region of interest. (Savvides, 0056; An advantage 
Park, Syeda, Se and Savvides do not disclose expressly to the one or more patches of interest  wherein the model is trained using a supervised learning process for auto-registration of the multi-modal senor data and outputs classification values.
Meier discloses to the one or more patches of interest  wherein the model is trained using a supervised learning process for auto-registration of the multi-modal senor data and outputs classification values. (Meier, 0040; A processing node within the network may represent its activation as an analog value, a binary activation state, a probability, a belief distribution, a discrete state on N possibilities, a point process over time, or any representation appropriate to the supervised learning algorithm employed. EC: Patches of interest is addressed by Savvides.) It would have been obvious to one 
Park, Syeda, Se, Savvides and Meier do not disclose expressly performing post-processing of a result of applying the model to produce a post-processing result for the one or more patches of interest wherein the post-processing comprises aggregating the classification values and identifying a dominating label representing an alignment offset between at least two of the sensors to determine whether a misalignment exists for a same field of view.
Venkataraman discloses performing post-processing of a result of applying the model to produce a post-processing result for the one or more patches of interest wherein the post-processing comprises aggregating the classification values and identifying a dominating label representing an alignment offset between at least two of the sensors to determine whether a misalignment exists for a same field of view. (Venkataraman, 0034, 0044, 0069; ‘The camera array 100 is fabricated on a semiconductor chip to include a plurality of imagers 1A through NM. Each of the imagers 1A through NM may include a plurality of pixels (e.g., 0.32 Mega pixels).’ and ‘The number of imagers in the camera array may be determined based on, among other factors, (i) resolution, (ii) parallax, (iii) sensitivity, and (iv) dynamic range.’ with After the image is processed performing post-processing by the upstream pipeline processing module 510, an image pixel one or more patches of interest correlation module 514 

Claim 2
Park, Syeda and Se do not disclose expressly wherein detecting the at least one region of interest comprises performing edge detection on one or more images in the multi-modal sensor data.
Savvides discloses wherein detecting the at least one region of interest comprises performing edge detection on one or more images in the multi-modal sensor data. (Savvides, 0015; FIG. 9 illustrates the fact that that since frame differencing leads to a large number of ON pixels on the person's edges, whenever the histogram's parameters are not in the optimal or nearly optimal range several artifacts may appear.) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se and Savvides before him before the effective filing date of the claimed invention, to modify Park, Syeda and Se to incorporate considering a region of interest, an edge within a region of interest, pixels and thresholds which are associated with images and edges, employing encoders and hardware, of Savvides. Given the advantage of lowering the computational costs by using an area of interest, edges are determining boundaries, a large change might indicate the presence of an edge, 

Claim 3
Park, Syeda, Se and Savvides do not disclose expressly wherein the edge detection comprises occlusion edge detection.
Meier discloses wherein the edge detection comprises occlusion edge detection. (Meier, 0135;  The x-y plot of a cross section of an estimate of the probability of an occluding edge 1270 indicates the estimated probability of occlusion on the vertical axis and the horizontal position along the views 1200 and 1250 along the x axis.) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se, Savvides and Meier before him before the effective filing date of the claimed invention, to modify Park, Syeda, Se and Savvides to incorporate supervised or non-supervised learning of Meier. Given the advantage of the option of manual or automatic training, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 4
Park, Syeda and Se do not disclose expressly wherein detecting the one or more patches of interest comprises applying a threshold on a percentage of pixels with edges in a given patch.
Savvides, 0084. 0076; ‘As a direct consequence of this unpredictability, in one aspect, designers of computer vision systems generally elect to overestimate or underestimate the threshold value based on whether robustness is to be prioritized against false positives or false negatives, respectively.’ and ‘It should be appreciated that despite quality or refinement, a background modeling technique can often misclassify pixels in a variety of scenarios, due to a variety of factors, such as: [0077] (1) natural oscillations in pixel intensity; [0078] (2) changes in lighting (e.g., those imposed by shadows or the movement of the sun); [0079] (3) presence of repetitive background motion (for example, waving foliage or other forms of dynamic foreground); [0080] (4) sudden, unexpected changes in position of static objects in the scene; and [0081] (5) similarity of pixels in foreground objects to pixels from the background.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se and Savvides before him before the effective filing date of the claimed invention, to modify Park, Syeda and Se to incorporate considering a region of interest, an edge within a region of interest, pixels and thresholds which are associated with images and edges, employing encoders and hardware, of Savvides. Given the advantage of lowering the computational costs by using an area of interest, edges are determining boundaries, a large change might indicate the presence of an edge, hardware for real world employment and encoders to handle noise within input data, one having ordinary skill in the art would have been motivated to make this obvious modification. 


Park discloses receiving multi-modal sensor data from a plurality of sensors having different modalities comprising an imaging sensor configured to capture a video input (Park, 0057; …As illustrated in FIG. 4, in the case of multi-modal deep learning based on acoustic signal (audio input) and image signals ( video input),…)  and a ranging sensor configured to capture a depth input (Park, 0040; In the case where the 3D information acquisition unit 122 includes the 3D geometric information estimation unit, the 3D geometric information estimation unit may be configured to include one or more of existing various range sensors and depth sensors;….);…. applying a model comprising a deep convolutional neural network. (Park, 0055; Hereinafter, a process of acquiring integrated feature of acoustic signal and image by using a multi-modal deep learning scheme in the aforementioned speech recognition learning methods according to the embodiment will be described in detail. FIG. 4 is a schematic diagram for explaining deep learning. The aforementioned speech recognition learning methods are to perform speech recognition learning by using deep learning through one of or a combination of DNN ( Deep Neural Network), DBN ( Deep Belief Network), and DCN ( Deep Convolutional Network).)
Park does not disclose expressly deriving a plurality of optical flow data defining a change in scene dynamics between at least two frames of the video input based on a motion field change.
Syeda discloses deriving a plurality of optical flow data defining a change in scene dynamics between at least two frames of the video input based on a motion field change. (Syeda, c9:8-20; As shown in FIG. 3, the invention identifies likely matches of 
Park and Syeda do not disclose expressly fusing multi-modal sensor data from the plurality of sensors having different modalities and the plurality of optical flow data;…. outputting a perception indication of the post-processing result to a decision making process for autonomous navigation of a vehicle.
Se discloses fusing multi-modal sensor data from the plurality of sensors having different modalities and the plurality of optical flow data (Se, 0006; One sensor (especially with fixed optics and at one stand-off distance) cannot provide the resolution and depth of field required for the whole range of operations, e.g., room model and blood spatter analysis. Data from multi-modal sensors has to be fused together, e.g., room model and close-up images of fingerprints.);…. outputting a perception indication of the post-processing result to a decision making process for autonomous navigation of a vehicle. (Se, 0025; The system can be used for visual odometry, for automatically computing relative position and orientation using sequences of images for autonomous vehicle navigation.) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda and Se before him before the effective filing date of 
Park, Syeda and Se do not disclose expressly detecting at least one region of interest in the multi-modal sensor data; detecting one or more patches of interest in the multi-modal sensor data based on detecting the at least one region of interest;…. using a deep auto-encoder and outputs reconstruction errors.
Savvides discloses detecting at least one region of interest in the multi-modal sensor data; detecting one or more patches of interest in the multi-modal sensor data based on detecting the at least one region of interest (Savvides, 0056; An advantage of uninstrumented solutions is that they present a lower barrier-of-entry when used in public spaces. That is, in theory any person who enters the sensing area can be automatically detected by one or more uninstrumented sensors placed, for example, on the environment's infrastructure.);…. using a deep auto-encoder and outputs reconstruction errors. (Savvides, 0254; Examples of such techniques include, but are not limited to, expert systems, case based reasoning, Bayesian networks, behavior based AI, neural networks, fuzzy systems, evolutionary computation (e.g., genetic algorithms), swarm intelligence (e.g., ant algorithms), and hybrid intelligent systems (e.g., Expert inference rules generated through a neural network or production rules from statistical learning). EC: a auto-encoder is merely a neural network which has the same amount of inputs as outputs. It can be seen as a copier design. It is often used with images as input.) It would have been obvious to one having ordinary skill in the art, 
Park, Syeda, Se and Savvides do not disclose expressly to the one or more patches of interest, wherein the model is trained using an unsupervised learning process.
Meier discloses to the one or more patches of interest, wherein the model is trained using an unsupervised learning process. (Meier, 0070; ‘FIG. 2 illustrates a flow chart of a conceptual process 200 used by some embodiments to implement a hybrid method that uses supervised and unsupervised learning.’ EC: Patches of interest is addressed by Savvides.) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se, Savvides and Meier before him before the effective filing date of the claimed invention, to modify Park, Syeda, Se and Savvides to incorporate supervised or non-supervised learning of Meier. Given the advantage of the option of manual or automatic training, one having ordinary skill in the art would have been motivated to make this obvious modification.

Venkataraman discloses performing post-processing of a result of applying the model to produce a post-processing result for the one or more patches of interest wherein the post-processing comprises aggregating the reconstruction errors and identifying a misalignment between at least two of the sensors based on the aggregation of the reconstruction errors for a same field of view. (Venkataraman, 0034, 0044, 0069; ‘The camera array 100 is fabricated on a semiconductor chip to include a plurality of imagers 1A through NM. Each of the imagers 1A through NM may include a plurality of pixels (e.g., 0.32 Mega pixels).’ and ‘The number of imagers in the camera array may be determined based on, among other factors, (i) resolution, (ii) parallax, (iii) sensitivity, and (iv) dynamic range.’ with After the image is processed performing post-processing by the upstream pipeline processing module 510, an image pixel one or more patches of interest correlation module 514 performs calculation post-processing comprises aggregating the classification values to account for parallax that becomes more apparent as objects being captured approaches to the camera array identifying a dominating label representing an alignment offset between at least two of the sensors. Specifically, the image pixel correlation module 514 aligns portions of images captured by different imagers to compensate for the parallax. In one embodiment, the image pixel correlation module 514 compares the difference between the average values of 

Claim 10
Park, Syeda, Se and Savvides do not disclose expressly wherein the model is trained using a supervised learning process for occlusion edge labeling of the multi-modal senor data and outputs classification values.
Meier discloses wherein the model is trained using a supervised learning process for occlusion edge labeling of the multi-modal senor data and outputs classification values. (Meier, 0070, 0156; ‘FIG. 2 illustrates a flow chart of a conceptual process 200 used by some embodiments to implement a hybrid method that uses supervised and unsupervised learning.’ and ‘FIG. 15 illustrates a sequence of images 1500 used by some embodiments to estimate occluding edges of a transforming object and an x-y plot of occlusion error over time.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se, Savvides and Meier before him before the effective filing date of the claimed invention, to modify Park, Syeda, Se and Savvides to incorporate supervised or non-supervised learning of Meier. Given the advantage of the option of manual or automatic training, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 12
Park discloses a system comprising:…. receive multi-modal sensor data from a plurality of sensors having different modalities comprising an imaging sensor configured to capture a video input (Park, 0057; …As illustrated in FIG. 4, in the case of multi-modal deep learning based on acoustic signal (audio input) and image signals ( video Park, 0040; In the case where the 3D information acquisition unit 122 includes the 3D geometric information estimation unit, the 3D geometric information estimation unit may be configured to include one or more of existing various range sensors and depth sensors;….);…. apply a model comprising a deep convolutional neural network. (Park, 0055; Hereinafter, a process of acquiring integrated feature of acoustic signal and image by using a multi-modal deep learning scheme in the aforementioned speech recognition learning methods according to the embodiment will be described in detail. FIG. 4 is a schematic diagram for explaining deep learning. The aforementioned speech recognition learning methods are to perform speech recognition learning by using deep learning through one of or a combination of DNN ( Deep Neural Network), DBN ( Deep Belief Network), and DCN ( Deep Convolutional Network).)
Park does not disclose expressly derive a plurality of optical flow data defining a change in scene dynamics between at least two frames of the video input based on a motion field change.
Syeda discloses derive a plurality of optical flow data defining a change in scene dynamics between at least two frames of the video input based on a motion field change. (Syeda, c9:8-20; As shown in FIG. 3, the invention identifies likely matches of electronic slide images and video stream images by detecting differences in video stream images using a simple color histogram, 30;…) It would have been obvious to one having ordinary skill in the art, having the teachings of Park and Syeda before him before the effective filing date of the claimed invention, to modify Park to incorporate a method for a specific method of changes within a video input of Syeda. Given the 
Park and Syeda do not disclose expressly fuse multi-modal sensor data from the plurality of sensors and the plurality optical flow data …. output a perception indication of the post-processing result to a decision making process for autonomous navigation of a vehicle.
Se discloses fuse multi-modal sensor data from the plurality of sensors and the plurality optical flow data (Se, 0006; One sensor (especially with fixed optics and at one stand-off distance) cannot provide the resolution and depth of field required for the whole range of operations, e.g., room model and blood spatter analysis. Data from multi-modal sensors has to be fused together, e.g., room model and close-up images of fingerprints.)…. output a perception indication of the post-processing result to a decision making process for autonomous navigation of a vehicle. (Se, 0025; The system can be used for visual odometry, for automatically computing relative position and orientation using sequences of images for autonomous vehicle navigation.) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda and Se before him before the effective filing date of the claimed invention, to modify Park and Syeda to incorporate fusing data for navigational purposes of Se. Given the advantage of all incoming data must be synced (fused) to obtain a current position determination, one having ordinary skill in the art would have been motivated to make this obvious modification. 

Savvides discloses a sensor system comprising a plurality of sensors having different modalities (Savvides, 0003; Although the needs and design constraints of these environments are expected to vary greatly from application to application, the disclosure identifies five spatio-temporal properties stand out as fundamental to the human-sensing field: (1) human presence, (2) count, (3) location, (4) track (spatio-temporal history), and (5) identi3ty. In one aspect, the disclosure provides a multimodal sensor network that is designed to extract these five fundamental properties. Such network can combine cameras scattered in an environment with inertial sensors (such as accelerometers and magnetometers) present in wearable devices coupled to at least some of the people that may be present in the environment.); and a perception processor configured to (Savvides, 0084, 0040; ‘One exemplary issue, at a practical level, is that pattern recognition can be a substantively more processor-intensive solution than background subtraction.’ And ‘As an example, a unit may be, but is not limited to being, a process running on a processor, a processor, an object, an executable computer program, a thread of execution, a program, a memory (e.g., a hard disc drive), and/or a computer.’):….detect at least one region of interest in the multi-Savvides, 0056; An advantage of uninstrumented solutions is that they present a lower barrier-of-entry when used in public spaces. That is, in theory any person who enters the sensing area can be automatically detected by one or more uninstrumented sensors placed, for example, on the environment's infrastructure.),…. to at least one detected patch of interest, perform post-processing of a result of applying the model to produce a post-processing result for the at least one detected patch of interest. (Savvides, 0150; Post processing result maps to tracking. ‘For this, let .theta..sub.K.sup.h compactly denote a track hypothesis, such that …(variables)… the set of all track hypotheses up to frame K. Then we can calculate the following joint probability:…’) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se and Savvides before him before the effective filing date of the claimed invention, to modify Park, Syeda and Se to incorporate considering a region of interest, an edge within a region of interest, pixels and thresholds which are associated with images and edges, employing encoders and hardware, of Savvides. Given the advantage of lowering the computational costs by using an area of interest, edges are determining boundaries, a large change might indicate the presence of an edge, hardware for real world employment and encoders to handle noise within input data, one having ordinary skill in the art would have been motivated to make this obvious modification. 
Park, Syeda, Se and Savvides do not disclose expressly wherein the model is trained using a supervised learning process for auto-registration of the multi-modal senor data.
Meier, 0040; A processing node within the network may represent its activation as an analog value, a binary activation state, a probability, a belief distribution, a discrete state on N possibilities, a point process over time, or any representation appropriate to the supervised learning algorithm employed.) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se, Savvides and Meier before him before the effective filing date of the claimed invention, to modify Park, Syeda, Se and Savvides to incorporate supervised or non-supervised learning of Meier. Given the advantage of the option of manual or automatic training, one having ordinary skill in the art would have been motivated to make this obvious modification.
Park, Syeda, Se, Savvides and Meier do not disclose expressly and outputs classification values, and wherein the post-processing comprises aggregating the classification values and identifying a dominating label representing an alignment offset between at least two of the sensors to determine whether a misalignment exists of a same field of view.
Venkataraman discloses and outputs classification values, and wherein the post-processing comprises aggregating the classification values and identifying a dominating label representing an alignment offset between at least two of the sensors to determine whether a misalignment exists of a same field of view. (Venkataraman, 0034, 0044, 0069; ‘The camera array 100 is fabricated on a semiconductor chip to include a plurality of imagers 1A through NM. Each of the imagers 1A through NM may include a plurality of pixels (e.g., 0.32 Mega pixels).’ and ‘The number of imagers in the camera array may 

Claim 13
Park, Syeda and Se do not disclose expressly wherein the at least one region of interest is detected by performing edge detection on one or more images in the multi-modal sensor data.
Savvides discloses wherein the at least one region of interest is detected by performing edge detection on one or more images in the multi-modal sensor data. (Savvides, 0015; FIG. 9 illustrates the fact that that since frame differencing leads to a large number of ON pixels on the person's edges, whenever the histogram's parameters are not in the optimal or nearly optimal range several artifacts may appear.) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se and Savvides before him before the effective filing date of the claimed invention, to modify Park, Syeda and Se to incorporate considering a region of interest, an edge within a region of interest, pixels and thresholds which are associated with images and edges, employing encoders and hardware, of Savvides. Given the 

Claim 14
Park, Syeda and Se do not disclose expressly wherein detection of the one or more patches of interest comprises applying a threshold on a percentage of pixels with edges in a given patch.
Savvides discloses wherein detection of the one or more patches of interest comprises applying a threshold on a percentage of pixels with edges in a given patch. (Savvides, 0084. 0076; ‘As a direct consequence of this unpredictability, in one aspect, designers of computer vision systems generally elect to overestimate or underestimate the threshold value based on whether robustness is to be prioritized against false positives or false negatives, respectively.’ and ‘It should be appreciated that despite quality or refinement, a background modeling technique can often misclassify pixels in a variety of scenarios, due to a variety of factors, such as: [0077] (1) natural oscillations in pixel intensity; [0078] (2) changes in lighting (e.g., those imposed by shadows or the movement of the sun); [0079] (3) presence of repetitive background motion (for example, waving foliage or other forms of dynamic foreground); [0080] (4) sudden, unexpected changes in position of static objects in the scene; and [0081] (5) similarity of pixels in foreground objects to pixels from the background.’) It would have been obvious 

Claim 20
Park, Syeda and Se do not disclose expressly wherein the model.
Savvides discloses wherein the model. (Savvides, 0254; Bayesian networks, behavior based AI, neural networks, fuzzy systems, evolutionary computation (e.g., genetic algorithms), swarm intelligence (e.g., ant algorithms), and hybrid intelligent systems (e.g., Expert inference rules generated through a neural network or production rules from statistical learning).) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se and Savvides before him before the effective filing date of the claimed invention, to modify Park, Syeda and Se to incorporate considering a region of interest, an edge within a region of interest, pixels and thresholds which are associated with images and edges, employing encoders and hardware, of Savvides. Given the advantage of lowering the computational costs by using an area of interest, edges are determining boundaries, a large change might 
Park, Syeda, Se and Savvides do not disclose expressly is trained using a supervised learning process; for occlusion edge labeling of the multi-modal senor data and outputs classification values.
Meier discloses is trained using a supervised learning process (Meier, 0040; A processing node within the network may represent its activation as an analog value, a binary activation state, a probability, a belief distribution, a discrete state on N possibilities, a point process over time, or any representation appropriate to the supervised learning algorithm employed.); for occlusion edge labeling of the multi-modal senor data and outputs classification values. (Meier, 0135;  The x-y plot of a cross section of an estimate of the probability of an occluding edge 1270 indicates the estimated probability of occlusion on the vertical axis and the horizontal position along the views 1200 and 1250 along the x axis.) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se, Savvides and Meier before him before the effective filing date of the claimed invention, to modify Park, Syeda, Se and Savvides to incorporate supervised or non-supervised learning of Meier. Given the advantage of the option of manual or automatic training, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 8-9 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Park, Syeda, Se, Savvides, Meier and Venkataraman as applied to claims 1-4, 7,  above, and further in view of Riederer. (U. S. Patent 8390675, referred to as Riederer)

Claim 8
Park, Syeda, Se, Savvides, Meier and Venkataraman do not disclose expressly wherein the perception indication is a misalignment indicator used to adjust at least one actuator of the at least two sensors having the misalignment.
Riederer discloses wherein the perception indication is a misalignment indicator used to adjust at least one actuator of the at least two sensors having the misalignment. (Riederer, c8:65 through c9:25, ‘12. Means to adjust the vertical alignment of the cameras' images, in this embodiment it is an actuator-driven screw;….’ and ‘Convergence could be adjusted by use of a mechanism with an actuator to deflect either or both of the sensors' mounting and their respective lens and axis.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se, Savvides, Meier, Venkataraman and Riederer before him before the effective filing date of the claimed invention, to modify Park, Syeda, Se, Savvides, Meier and Venkataraman to incorporate a physical actuator for movement of camera(s) of Riederer. Given the advantage of eliminate the computational costs of digitally adjusting an image for parallax, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 9

Riederer discloses wherein the perception indication is a misalignment indicator used to adjust at least one actuator of the at least two sensors having the misalignment. (Riederer, c8:65 through c9:25, ‘12. Means to adjust the vertical alignment of the cameras' images, in this embodiment it is an actuator-driven screw;….’ and ‘Convergence could be adjusted by use of a mechanism with an actuator to deflect either or both of the sensors' mounting and their respective lens and axis.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se, Savvides, Meier, Venkataraman and Riederer before him before the effective filing date of the claimed invention, to modify Park, Syeda, Se, Savvides, Meier and Venkataraman to incorporate a physical actuator for movement of camera(s) of Riederer. Given the advantage of eliminate the computational costs of digitally adjusting an image for parallax, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim 19
Park, Syeda, Se, Savvides, Meier and Venkataraman do not disclose expressly wherein the perception indication is a misalignment indicator used to adjust at least one actuator of the at least two sensors having the misalignment.
Riederer discloses wherein the perception indication is a misalignment indicator used to adjust at least one actuator of the at least two sensors having the misalignment. Riederer, c8:65 through c9:25, ‘12. Means to adjust the vertical alignment of the cameras' images, in this embodiment it is an actuator-driven screw;….’ and ‘Convergence could be adjusted by use of a mechanism with an actuator to deflect either or both of the sensors' mounting and their respective lens and axis.’) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se, Savvides, Meier, Venkataraman and Riederer before him before the effective filing date of the claimed invention, to modify Park, Syeda, Se, Savvides, Meier and Venkataraman to incorporate a physical actuator for movement of camera(s) of Riederer. Given the advantage of eliminate the computational costs of digitally adjusting an image for parallax, one having ordinary skill in the art would have been motivated to make this obvious modification.

Claim(s) 11 and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Park, Syeda, Se, Savvides, Meier and Venkataraman as applied to claims 1-4, 7, 10, 12-14 and 20 above, and further in view of Mehrotra. (‘Elements of artificial neural networks’, referred to as Mehrotra)

Claim 11
Park, Syeda, Se and Savvides do not disclose expressly and further wherein the perception indication comprises identification of occlusion edges.
Meier discloses and further wherein the perception indication comprises identification of occlusion edges. (Meier, 0070, 0156; ‘FIG. 2 illustrates a flow chart of a conceptual process 200 used by some embodiments to implement a hybrid method that 
Park, Syeda, Se, Savvides, Meier and Venkataraman do not disclose expressly wherein the post-processing comprises aggregating the classification values and smoothing the classification.
Mehrotra discloses wherein the post-processing comprises aggregating the classification values and smoothing the classification. (Mehrotra, p11 fig1.5, p13-14; ‘Sigmoid functions The most popular node functions used in neural nets are "sigmoid" (S-shaped) functions, whose output is illustrated in figure 1 .8. These functions are continuous and differentiable everywhere, are rotationally symmetric about some point (net = c), and asymptotically approach their saturation values (a, b) where x, y, and z are parameters that determine a, b, and c for figure 1 .8. The advantage of these functions is that their smoothness makes it easy to devise learning algorithms and understand the behavior of large networks whose nodes compute such functions.’ EC: The claim is describing basic neural network design and functions.) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se, Savvides, Meier, Venkataraman and Mehrotra before him before the effective filing date 

Claim 23
Park, Syeda, Se and Savvides do not disclose expressly and further wherein the perception indication comprises identification of occlusion edges.
Meier discloses and further wherein the perception indication comprises identification of occlusion edges. (Meier, 0135;  The x-y plot of a cross section of an estimate of the probability of an occluding edge 1270 indicates the estimated probability of occlusion on the vertical axis and the horizontal position along the views 1200 and 1250 along the x axis.) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se, Savvides and Meier before him before the effective filing date of the claimed invention, to modify Park, Syeda, Se and Savvides to incorporate supervised or non-supervised learning of Meier. Given the advantage of the option of manual or automatic training, one having ordinary skill in the art would have been motivated to make this obvious modification.
Park, Syeda, Se, Savvides, Meier and Venkataraman do not disclose expressly wherein the post-processing comprises aggregating the classification values and smoothing the classification.
Mehrotra, p11 fig1.5, p13-14; ‘Sigmoid functions The most popular node functions used in neural nets are "sigmoid" (S-shaped) functions, whose output is illustrated in figure 1 .8. These functions are continuous and differentiable everywhere, are rotationally symmetric about some point (net = c), and asymptotically approach their saturation values (a, b) where x, y, and z are parameters that determine a, b, and c for figure 1 .8. The advantage of these functions is that their smoothness makes it easy to devise learning algorithms and understand the behavior of large networks whose nodes compute such functions.’ EC: The claim is describing basic neural network design and functions.) It would have been obvious to one having ordinary skill in the art, having the teachings of Park, Syeda, Se, Savvides, Meier, Venkataraman and Mehrotra before him before the effective filing date of the claimed invention, to modify Park, Syeda, Se, Savvides, Meier and Venkataraman to incorporate background information pertaining to neural networks of Mehrotra. Given the advantage of illustrating a segment of the neural network which occurs within a node of each layer of a neural network which has a smoothing effect, one having ordinary skill in the art would have been motivated to make this obvious modification.


Response to Arguments
4.	Applicant’s arguments filed on June 10, 2021 for claims 1-4, 7-14, 19-20 and 23 have been fully considered but are not persuasive.

5.	Applicant's argument: 
Rejections Under 35 U.S.C. §112
Claims 10-11 and 20-22 have been rejected under 35 USC 112(a) or 35 USC 112 (pre- AIA ), first paragraph, as allegedly failing to comply with the written description requirement. Applicant respectfully traverses.
Claims 10-11 and 20 have been amended, and claims 21-22 have been canceled to address the rejection. Withdrawal of the rejection is respectfully requested.

Examiner’s answer:
In light of the amendments, the examiner removes the rejection. 

6.	Applicant's argument: 
Claim 1 has been amended to recite, inter alia, “receiving multi-modal sensor data from a plurality of sensors having different modalities comprising an imaging sensor configured to capture a video input and a ranging sensor configured to capture a depth input; deriving a plurality of optical flow data defining a change in scene dynamics between at least two frames of the video input based on a motion field change; fusing multi-modal sensor data from the plurality of sensors 92243US01 (U400014US2) having different modalities and the plurality of optical flow data” and “outputting a perception indication of the post-processing result to a decision-making process for autonomous navigation of a vehicle” in combination with other limitations, which is not taught or suggested by the cited art. 

Examiner’s answer:
This application started out as fusing data. Now the last claim element pertains to navigation abilities. The examiner still views the application to its original meaning and the navigation abilities is considered one of many possibilities for uses. 
New art is used to address the cited limitations cited above.

7.	Applicant's argument: 


Examiner’s answer:
This has been addressed above. In addition, ‘autonomous navigation’ is mentioned only once within the specification [0060] and only concerned with edge detection. If the claimed invention is only concerned with autonomous navigation, then this should have been stated within the preamble to establish an initial domain of the application and not the last claimed element which is only mentioned once within the specification.  

8.	Claims 1-4, 7-14, 19-20 and 23 are rejected.
	

Conclusion	
9.	The prior art of record and not relied upon is considered pertinent to the applicant’s disclosure.
	-Search terms: autonomous navigation, distance, lidar, range finder, video, image

	-U. S. Patent Publication 20110288714: Flohr

Correspondence Information
10.	Any inquiry concerning this information or related to the subject disclosure should be directed to the Examiner Mr. Peter Coughlan, whose telephone number is (571) 272-5990 (Fax 571-273-5990).  The Examiner can be reached on Monday through Friday from 7:15 a.m. to 3:45 p.m.
	If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor Mr. Li Zhen can be reached at (571) 272-3768.  Any response to this office action should be mailed to:
	Commissioner of Patents and Trademarks, 
	Washington, D. C. 20231;
Hand delivered to:
	Receptionist, 
	Customer Service Window, 
	Randolph Building, 
	401 Dulany Street,
	Alexandria, Virginia 22313,
	(located on the first floor of the south side of the Randolph Building);
or faxed to:
	(571) 272-3150 (for formal communications intended for entry.)








/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121