Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Notice to Applicants
This communication is in response to the Application filed on 3/15/2021.
Claims 1-65 are pending.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 11, 14, 17, 27, 30, 33, 43, 46, 49, 59, 61 and 62 are rejected under 35 U.S.C. 103 as being unpatentable over Aksoy et al. (U.S Publication No. 2020/0302682) (hereafter, "Aksoy") in view of Luo et al. (U.S Publication No. 2021/0248769) (hereafter, "Luo").
Regarding claim 1, Aksoy teaches a method, comprising: receiving a first disparity value indicating a difference in field of view between a first image sensor and a second image sensor ([0015] the system can provide a first image of a first view and a second image of a second view captured by the one or more cameras to a motion estimator of a video encoder (e.g., a graphics processing unit (GPU) video encoder). The motion estimator can calculate first and second disparity offsets between the first image and the second image as if the motion estimator were calculating motion vectors between sequential images; [0023] the image capture devices 104a . . . n can use the sensor circuitry to generate the first image 112a corresponding to the first view and the second image 112b corresponding to the second view); receiving a first depth value corresponding to the first disparity value; and ([0035] The depth buffer generator 132 can receive the disparity offsets from the video encoder 124, and generate depth buffers (e.g., depth maps) for each image based on the disparity offsets. For example, the depth buffer generator 132 can generate a first depth buffer based on the first disparity offsets for the first image 112a, and generate a second depth buffer based on the second disparity offsets for the second image 112b).
Aksoy does not expressly teach determining a model for a disparity between the first image sensor and the second image sensor at a plurality of depth values, the model based, at least in part, on the first disparity value and the first depth.
However, Luo teaches determining a model for a disparity between the first image sensor and the second image sensor at a plurality of depth values, the model based, at least in part, on the first disparity value and the first depth ([0035] A machine learning algorithm is applied to the image frames in order to generate multiple disparity maps and multiple confidence maps associated with the disparity maps. Each disparity map is produced using a different pair of the image frames, and each disparity map is associated with a specific baseline direction that identifies an axis along which the two imaging sensors that captured the pair of the image frames are separated … Each confidence map identifies the level of confidence that the machine learning algorithm has in the disparities identified in one of the disparity maps along the associated baseline direction. The disparity maps and the confidence maps can be fused to produce a final depth map of the scene based on the input image frames; [0033] The ability to simultaneously capture multiple images of a scene allows an electronic device to perform disparity processing in order to identify depths of different image pixels within the scene. Disparity refers to the difference in pixel locations of the same point in a scene as captured in different images of the scene. Depth has a known relationship to disparity. A point within a scene that is farther away (has a larger depth) will typically have a smaller disparity, meaning pixels capturing that point in different images will be closer to each other in the images. A point within a scene that is closer (has a smaller depth) will typically have a larger disparity, meaning pixels capturing that point in different images will be farther apart from each other in the images; [0063] These correlations are used later to identify how common points in a scene are captured at different pixel locations in the image frames 402 and 404, thereby identifying disparities associated with the image frames 402 and 404; [0055] Because of the offsets of the imaging sensors 202, 204, and 206 in the baseline directions 208 and 210, image frames captured using the imaging sensors have various levels of disparities, which depend on the depths of objects or backgrounds in the scene being imaged; [0056] all three image frames 302, 304, and 306 capture an object 308, which in this example simply represents a triangular shape … As shown in FIG. 3B, there is a horizontal disparity 312 between the object 308 and the ghost object 310 along the baseline direction 208. As shown in FIG. 3C, there is a vertical disparity 314 between the object 308 and the ghost object 310).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Aksoy to incorporate the step/system of using machine learning algorithm for disparities between two imaging sensors at different depths taught by Luo.
The suggestion/motivation for doing so would have been to improve the accuracy of depth map ([0036] In this way, it is possible to use image frames captured using three or more cameras or other imaging sensors to significantly increase the accuracy of a final depth map for a scene. Among other reasons, this is because disparities along multiple baseline directions are calculated and used, along with their confidence levels, to generate the final depth map. This also enables various image processing operations to obtain more aesthetically-pleasing or accurate results based on the generated depth maps).  Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Aksoy with Luo to obtain the invention as specified in claim 1.
Regarding claim 11, the combination of Aksoy and Luo teaches all the limitations of claim 1 above. Aksoy teaches wherein: the first disparity value indicates a difference in field of view along a first axis, and the method further comprising: receiving a second disparity value indicating a difference in field of view between the first image sensor and the second image sensor along a second axis different from the first axis ([0015] the system can provide a first image of a first view and a second image of a second view captured by the one or more cameras to a motion estimator of a video encoder (e.g., a graphics processing unit (GPU) video encoder); [0022] The system 100 can include a first image capture device 104a that includes a first lens 108a, the first image capture device 104a arranged to capture a first image 112a of a first view, and a second image capture device 104b that includes a second lens 108b, the second image capture device 104b arranged to capture a second image 112b of a second view. The first view and the second view may correspond to different perspectives, enabling depth information to be extracted from the first image 112a and second image 112b. For example, the first view may correspond to a left eye view, and the second view may correspond to a right eye view).
Aksoy does not expressly teach wherein the step of determining the model is further based, at least in part, on the second disparity value.
However, Luo teaches wherein the step of determining the model is further based, at least in part, on the second disparity value ([0062] Each feature extractor 408, 410, and 412 may represent a trained machine learning model or other algorithm for identifying features of image frames. Each feature extractor 408, 410, and 412 may use any suitable technique to identify features of input image frames, such as when implemented using multiple layers of a trained convolutional neural network (CNN); [0063] These correlations are used later to identify how common points in a scene are captured at different pixel locations in the image frames 402 and 404, thereby identifying disparities associated with the image frames 402 and 404 … These correlations are used later to identify how common points in the scene are captured at different pixel locations in the image frames 402 and 406, thereby identifying disparities associated with the image frames 402 and 406. Each cross-correlation function 420 and 422 may represent a trained machine learning model or other algorithm for identifying correlations between features of image frames; [0056] As shown in FIG. 3B, there is a horizontal disparity 312 between the object 308 and the ghost object 310 along the baseline direction 208. As shown in FIG. 3C, there is a vertical disparity 314 between the object 308 and the ghost object 310 along the baseline direction 210; [0057] the electronic device 101 or other device can process three or more image frames of a scene and generate multiple disparity maps that identify disparities between the image frames along multiple baseline directions).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Aksoy to incorporate the step/system of applying the machine learning algorithm by using disparity maps taught by Luo. 
Motivation for this combination has been stated in claim 1.
Regarding claim 14, the combination of Aksoy and Luo teaches all the limitations of claim 1 above. Aksoy teaches further comprising determining the first depth value based on range imaging ([0035] The processing circuitry 116 can include a depth buffer generator 132. The depth buffer generator 132 can receive the disparity offsets from the video encoder 124, and generate depth buffers (e.g., depth maps) for each image based on the disparity offsets. For example, the depth buffer generator 132 can generate a first depth buffer based on the first disparity offsets for the first image 112a, and generate a second depth buffer based on the second disparity offsets for the second image 112b; [0033] Where the elements of the images 112a, 112b include multiple first pixels of the first image 112a and multiple second pixels of the second image 112b, the motion estimator 128 can generate the first disparity offset to include one or more vectors (or angle and distance values) corresponding to comparing each first pixel to each corresponding second pixel, or comparing one or more representative first pixels to one or more representative second pixels). 
With respect to claim 17, arguments analogous to those presented for claim 1, are applicable.
With respect to claim 27, arguments analogous to those presented for claim 11, are applicable.
With respect to claim 30, arguments analogous to those presented for claim 14, are applicable.
With respect to claim 33, arguments analogous to those presented for claim 1, are applicable.
With respect to claim 43, arguments analogous to those presented for claim 11, are applicable.
With respect to claim 46, arguments analogous to those presented for claim 14, are applicable.
Regarding claim 49, Aksoy teaches a device, comprising: a first image sensor configured with a first field of view; a second image sensor configured with a second field of view at least partially overlapping the first field of view; a processor coupled to the first image sensor and coupled to the second image sensor ([0022] The system 100 can include a first image capture device 104a that includes a first lens 108a, the first image capture device 104a arranged to capture a first image 112a of a first view, and a second image capture device 104b that includes a second lens 108b, the second image capture device 104b arranged to capture a second image 112b of a second view. The first view and the second view may correspond to different perspectives, enabling depth information to be extracted from the first image 112a and second image 112b. For example, the first view may correspond to a left eye view, and the second view may correspond to a right eye view); and a memory coupled to the processor ([0070] The system memory can store some or all of the instructions and data that processing unit(s) 404 need at runtime), wherein the processor is configured to perform steps comprising: receiving a first disparity value indicating a difference in field of view between the first image sensor and the second image sensor ([0015] the system can provide a first image of a first view and a second image of a second view captured by the one or more cameras to a motion estimator of a video encoder (e.g., a graphics processing unit (GPU) video encoder). The motion estimator can calculate first and second disparity offsets between the first image and the second image as if the motion estimator were calculating motion vectors between sequential images; [0023] the image capture devices 104a . . . n can use the sensor circuitry to generate the first image 112a corresponding to the first view and the second image 112b corresponding to the second view); receiving a first depth value corresponding to the first disparity value; and ([0035] The depth buffer generator 132 can receive the disparity offsets from the video encoder 124, and generate depth buffers (e.g., depth maps) for each image based on the disparity offsets. For example, the depth buffer generator 132 can generate a first depth buffer based on the first disparity offsets for the first image 112a, and generate a second depth buffer based on the second disparity offsets for the second image 112b).
Aksoy does not expressly teach determining a model for a disparity between the first image sensor and the second image sensor at a plurality of depth values, the model based, at least in part, on the first disparity value and the first depth.
However, Luo teaches determining a model for a disparity between the first image sensor and the second image sensor at a plurality of depth values, the model based, at least in part, on the first disparity value and the first depth ([0035] A machine learning algorithm is applied to the image frames in order to generate multiple disparity maps and multiple confidence maps associated with the disparity maps. Each disparity map is produced using a different pair of the image frames, and each disparity map is associated with a specific baseline direction that identifies an axis along which the two imaging sensors that captured the pair of the image frames are separated; [0033] The ability to simultaneously capture multiple images of a scene allows an electronic device to perform disparity processing in order to identify depths of different image pixels within the scene. Disparity refers to the difference in pixel locations of the same point in a scene as captured in different images of the scene. Depth has a known relationship to disparity. A point within a scene that is farther away (has a larger depth) will typically have a smaller disparity, meaning pixels capturing that point in different images will be closer to each other in the images. A point within a scene that is closer (has a smaller depth) will typically have a larger disparity, meaning pixels capturing that point in different images will be farther apart from each other in the images; [0063] These correlations are used later to identify how common points in a scene are captured at different pixel locations in the image frames 402 and 404, thereby identifying disparities associated with the image frames 402 and 404; [0055] Because of the offsets of the imaging sensors 202, 204, and 206 in the baseline directions 208 and 210, image frames captured using the imaging sensors have various levels of disparities, which depend on the depths of objects or backgrounds in the scene being imaged; [0056] all three image frames 302, 304, and 306 capture an object 308, which in this example simply represents a triangular shape … As shown in FIG. 3B, there is a horizontal disparity 312 between the object 308 and the ghost object 310 along the baseline direction 208. As shown in FIG. 3C, there is a vertical disparity 314 between the object 308 and the ghost object 310).
	It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Aksoy to incorporate the step/system of using machine learning algorithm for disparities between two imaging sensors at different depths taught by Luo.
Motivation for this combination has been stated in claim 1.
With respect to claim 59, arguments analogous to those presented for claim 11, are applicable.

Regarding claim 61, the combination of Aksoy and Luo teaches all the limitations of claim 49 above. Aksoy teaches further comprising a depth sensor, wherein the processor is coupled to the depth sensor and is configured to receive the first depth value from the depth sensor ([0022] The system 100 can include a first image capture device 104a that includes a first lens 108a, the first image capture device 104a arranged to capture a first image 112a of a first view, and a second image capture device 104b that includes a second lens 108b, the second image capture device 104b arranged to capture a second image 112b of a second view. The first view and the second view may correspond to different perspectives, enabling depth information to be extracted from the first image 112a and second image 112b).
With respect to claim 62, arguments analogous to those presented for claim 14, are applicable.

Claim 2, 3, 9, 10, 16, 18, 19, 25, 26, 32, 34, 35, 41, 42, 48, 50, 51, 57, 58 and 64 are rejected under 35 U.S.C. 103 as being unpatentable over Aksoy et al. (U.S Publication No. 2020/0302682) (hereafter, "Aksoy") in view of Luo et al. (U.S Publication No. 2021/0248769) (hereafter, "Luo") and further in view of Obla et al. (U.S. Publication No. 2022/0046219) (hereafter, "Obla").
Regarding claim 2, the combination of Aksoy and Luo teaches all the limitations of claim 49 above. Aksoy teaches further comprising: receiving an input image frame from one of the first image sensor or the second image sensor ([0059] At 305, images are captured by one or more image capture devices. The images can be captured by receiving light via a lens at a sensor, and using the sensor to generate an image based on the received light, such as by generating an image representing the received light through a field of view of the lens as a data structure of pixels. The images can be captured continuously); receiving an input image depth corresponding to the input image frame ([0066] The display images can be reprojected by executing a PTW algorithm. For example, a first pixel of the first image can be identified, a corresponding first depth value of the first depth buffer can be identified, and a first display pixel can be adjusted used to present the information of the first pixel of the first image in the first display image. The PTW algorithm can be executed on a layer including the images, the corresponding depth buffers, and corresponding historical poses. The display images can be rendered using motion data regarding movement of the image capture devices. For example, motion data received from a position sensor can be used to account for a change in at least one of a position or an orientation of the HMD resulting from head movement).
The combination of Aksoy and Luo does not expressly teach determining a predicted disparity value corresponding to the input image depth based, at least in part, on the model; and determining a corrected image frame based, at least in part, on the input image frame and the predicted disparity value.
However, Obla teaches determining a predicted disparity value corresponding to the input image depth based, at least in part, on the model ([0055] the present invention provides a computer-implemented method comprising storing data defining a statistical model to predict depth data throughout the field of view of each subaperture image collectively comprising a multi-aperture image frame; and training the model on at least one input set of subaperture images, by: predicting, for at least one subaperture image in the set, corresponding disparity values throughout the field of view, computing disparity from at least two subaperture images in the set in the region where the fields of view overlap between the at least two subaperture images, and updating the model based on a cost function of the predicted disparity and computed disparity that enforces consistency between the predicted and computed disparity values for each subaperture image in the multi-aperture set); and determining a corrected image frame based, at least in part, on the input image frame and the predicted disparity value ([0103] The high and low resolution predicted disparity maps 245 and 246, respectively, are output from the respective subnetworks, 11-1 and 11-2, of the CNN module 11 to a disparity-to-range module 6 that is configured to generate an integrated range map 40 from the predicted disparity maps 245 and 246. In an embodiment, the subnetwork 11-1 of the CNN module 11 predicts disparity values from the higher magnification subaperture image 240 and generates the high-resolution disparity map 245 and the subnetwork 11-2 of the CNN module 11 predicts disparity values from lower magnification subaperture image 242 and generates the low-resolution predicted disparity map 246, where disparity values of the disparity maps are representative of the intra-image differences in position of detected objects or features within the respective subaperture image. That is, the high-resolution predicted disparity map 245 is predicted from the subaperture image 240 by the subnetwork 11-1 and is representative of disparity of objects captured in the subaperture image 240 and the low-resolution predicted disparity map 246 is predicted from the subaperture image 242 by the subnetwork 11-2 and is representative of disparity of objects captured in the subaperture image 242).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Aksoy to incorporate the step/system of predicting the disparity values corresponding to depth data by training the model and generating an integrated range map based on predicted disparity values and subaperture input image taught by Obla.
The suggestion/motivation for doing so would have been to improve the quality of range and depth results ([0053] What is desired is a device, an network architecture, and/or techniques that address limitations of conventional image and video based ranging techniques, thereby significantly increasing the quality and quantity of range and depth results, whereby a deep learning neural network based system for depth estimation incorporates prediction of depth throughout the entire field of view of the lens, where also multi-aperture data is available in the near field of input images, and where training a neural network (e.g., a CNN) is not fully dependent on training data originating from a different system; [0066] the learning module implements a loss function that enforces consistency between the predicted depth maps from each subaperture view during training, leading to improved predictions).  Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Aksoy with Luo to obtain the invention as specified in claim 2.
Regarding claim 3, the combination of Aksoy, Luo and Obla teaches all the limitations of claim 2 above. Aksoy teaches wherein determining the corrected image frame comprises determining, based on the predicted disparity value, the corrected image frame as warped to match a field of view of the other of the first image sensor or the second image sensor ([0044] The image renderer 136 can reproject the images 112a . . . k by executing a positional time warp (PTW) algorithm. For example, the image renderer 136 can identify a first pixel of the first image 112a, identify a corresponding first depth value of the first depth buffer, and adjust a first display pixel used to present the information of the first pixel of the first image 112a in the first display image; [0041] This modification of the display data may be referred to as warping. Reprojecting can correct or adjust for positional movement in the HMD or view of the wearer of the HMD, referred to as asynchronous timewarp. Reprojecting can correct or adjust for positional and rotational movement in the HMD or view of the wearer of the HMD, referred to as position time warp (PSW); [0035] The depth buffer generator 132 can receive the disparity offsets from the video encoder 124, and generate depth buffers (e.g., depth maps) for each image based on the disparity offsets).
Regarding claim 9, the combination of Aksoy, Luo and Obla teaches all the limitations of claim 2 above. Aksoy teaches further comprising: determining a video sequence comprising a first image frame from the first image sensor, a second image frame from the second image sensor, and the corrected image frame, wherein the corrected image frame appears in the video sequence between the first image frame and the second image frame ([0015] the system can provide a first image of a first view and a second image of a second view captured by the one or more cameras to a motion estimator of a video encoder (e.g., a graphics processing unit (GPU) video encoder). The motion estimator can calculate first and second disparity offsets between the first image and the second image as if the motion estimator were calculating motion vectors between sequential images. The disparity offsets can be processed, filtered, and smoothed to remove artifacts; [0022] The system 100 can include a first image capture device 104a that includes a first lens 108a, the first image capture device 104a arranged to capture a first image 112a of a first view, and a second image capture device 104b that includes a second lens 108b, the second image capture device 104b arranged to capture a second image 112b of a second view; [0043] The image renderer 136 can generate the display images by reprojecting the images 112a . . . k using the corresponding depth buffers. For example, the image renderer 136 can reproject the images 112a . . . k to position the images 112a . . . k in a correct image space or an image space that a user of the HMD is expected to perceive when the display images are displayed).
Regarding claim 10, the combination of Aksoy, Luo and Obla teaches all the limitations of claim 2 above. Aksoy teaches wherein the step of determining the corrected image frame is further based on an image frame of the other of the first image sensor or the second image sensor ([0015] the system can provide a first image of a first view and a second image of a second view captured by the one or more cameras; [0022] The system 100 can include a first image capture device 104a that includes a first lens 108a, the first image capture device 104a arranged to capture a first image 112a of a first view, and a second image capture device 104b that includes a second lens 108b, the second image capture device 104b arranged to capture a second image 112b of a second view; [0043] The image renderer 136 can generate the display images by reprojecting the images 112a . . . k using the corresponding depth buffers. For example, the image renderer 136 can reproject the images 112a . . . k to position the images 112a . . . k in a correct image space or an image space that a user of the HMD is expected to perceive when the display images are displayed).
Regarding claim 16, the combination of Aksoy, Luo and Obla teaches all the limitations of claim 14 above. The combination of Aksoy and Luo does not expressly teach wherein determining the first depth value comprises determining the first depth value based on a light detection and ranging (LIDAR) measurement.
However, Obla teaches wherein determining the first depth value comprises determining the first depth value based on a light detection and ranging (LIDAR) measurement ([0090] The source of external disparity range and depth data 47 may include by way of example and not limitation: LiDAR, Radar, Stereoscopic, Topographical, and hand measured data; [0079] The multi-aperture camera 200 captures information (or data) about the light field emanating from an object of interest in the camera's field of view 205. Such imaging data includes information about the intensity of the light emanating from the object of interest and also information about the direction that the light rays are traveling in space).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Aksoy to incorporate the step/system of determining the depth data based on a light detection and LIDAR taught by Obla.
Motivation for this combination has been stated in claim 2.

With respect to claim 18, arguments analogous to those presented for claim 2, are applicable.
With respect to claim 19, arguments analogous to those presented for claim 3, are applicable.
With respect to claim 25, arguments analogous to those presented for claim 9, are applicable.
With respect to claim 26, arguments analogous to those presented for claim 10, are applicable.
With respect to claim 32, arguments analogous to those presented for claim 16, are applicable.

With respect to claim 34, arguments analogous to those presented for claim 2, are applicable.
With respect to claim 35, arguments analogous to those presented for claim 3, are applicable.
With respect to claim 41, arguments analogous to those presented for claim 9, are applicable.
With respect to claim 42, arguments analogous to those presented for claim 10, are applicable.
With respect to claim 48, arguments analogous to those presented for claim 16, are applicable.

With respect to claim 50, arguments analogous to those presented for claim 2, are applicable.
With respect to claim 51, arguments analogous to those presented for claim 3, are applicable.
With respect to claim 57, arguments analogous to those presented for claim 9, are applicable.
With respect to claim 58, arguments analogous to those presented for claim 10, are applicable.
With respect to claim 64, arguments analogous to those presented for claim 16, are applicable.

Claim 4, 20, 36 and 52 are rejected under 35 U.S.C. 103 as being unpatentable over Aksoy et al. (U.S Publication No. 2020/0302682) (hereafter, "Aksoy") in view of Luo et al. (U.S Publication No. 2021/0248769) (hereafter, "Luo") and further in view of Obla et al. (U.S. Publication No. 2022/0046219) (hereafter, "Obla") and LI et al. (U.S. Publication No. 2022/0067881) (hereafter, "LI").  
Regarding claim 4, the combination of Aksoy, Luo and Obla teaches all the limitations of claim 2 above. The combination of Aksoy, Luo and Obla does not expressly teach wherein determining the corrected image frame comprises: determining a transformation matrix for warping the input image frame to a field of view of the other of the first image sensor or the second image sensor.
However, LI teaches wherein determining the corrected image frame comprises: determining a transformation matrix for warping the input image frame to a field of view of the other of the first image sensor or the second image sensor ([0024] In step S110, an image IMG1 containing at least one character is received by the deep learning model 110, and a perspective transformation matrix T is generated according to the image IMG1; [0025] In step S120, a perspective transformation is performed on the image IMG1 by the processing unit 120 according to the perspective transformation matrix T to obtain a corrected image IMG2 containing a front view of the at least one character).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Aksoy to incorporate the step/system of performing a transformation matrix on an image to obtain a corrected image containing a certain view taught by LI.
The suggestion/motivation for doing so would have been to improve the accuracy of recognition and efficiency of correction of images ([0006] The disclosure is directed to an image correction method and a system based on deep learning. The perspective transformation parameters for the image correction procedure are found by a deep learning model and used to efficiently correct various images to front view images and further update the deep learning model using the loss value to increase the recognition accuracy).  Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Aksoy with LI to obtain the invention as specified in claim 4.
With respect to claim 20, arguments analogous to those presented for claim 4, are applicable.
With respect to claim 36, arguments analogous to those presented for claim 4, are applicable.
With respect to claim 52, arguments analogous to those presented for claim 4, are applicable.

Claim 5, 15, 21, 31, 37, 47, 53 and 63 are rejected under 35 U.S.C. 103 as being unpatentable over Aksoy et al. (U.S Publication No. 2020/0302682) (hereafter, "Aksoy") in view of Luo et al. (U.S Publication No. 2021/0248769) (hereafter, "Luo") and further in view of Obla et al. (U.S. Publication No. 2022/0046219) (hereafter, "Obla"), LI et al. (U.S. Publication No. 2022/0067881) (hereafter, "LI") and KONISHI (U.S. Publication No. 2022/0357153).
Regarding claim 5, the combination of Aksoy, Luo, Obla and LI teaches all the limitations of claim 4 above. The combination of Aksoy, Luo, Obla and LI does not expressly teach wherein determining the transformation matrix comprises determining the transformation matrix with computer vision processing (CVP).
However, KONISHI teaches wherein determining the transformation matrix comprises determining the transformation matrix with computer vision processing (CVP) ([0021] Each transformation matrix calculated as above is thus more accurate. In other words, this structure further allows more accurate calibration of 3D measurement in the computer vision system).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Aksoy to incorporate the step/system of calculating transformation matrix in the computer vision system taught by KONISHI.
The suggestion/motivation for doing so would have been to improve the accuracy of calibration for a computer vision system ([0011] the 3D reference object shaped asymmetric as viewed in any direction and with predetermined dimensions is used as a reference for recognizing its position and orientation in the 3D measurement. This allows accurate calculation of the position and the orientation of the 3D reference object relative to the measurement unit coordinate system defined for the 3D measurement unit. More accurate calculation of the position and the orientation of the 3D reference object relative to the measurement unit coordinate system allows accurate calculation of the reference-measurement unit transformation matrix representing a coordinate transformation between the reference coordinate system and the measurement unit coordinate system. This allows accurate calibration for a computer vision system including a 3D measurement unit).  Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Aksoy with LI to obtain the invention as specified in claim 5.
Regarding claim 15, the combination of Aksoy and Luo teaches all the limitations of claim 14 above. The combination of Aksoy and Luo does not expressly teach wherein determining the first depth value comprises determining the first depth value based on a time of flight (ToF) measurement.
However, KONISHI teaches wherein determining the first depth value comprises determining the first depth value based on a time of flight (ToF) measurement ([0050] Any other method may be used to generate 3D information about the target objects, such as photometric stereo, a time-of-flight (TOF) method, or phase shifting).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Aksoy to incorporate the step/system of determining a depth value based on a time of flight (TOF) method taught by KONISHI.
Motivation for this combination has been stated in claim 5.
With respect to claim 21, arguments analogous to those presented for claim 5, are applicable.
With respect to claim 31, arguments analogous to those presented for claim 15, are applicable.
With respect to claim 37, arguments analogous to those presented for claim 5, are applicable.
With respect to claim 47, arguments analogous to those presented for claim 15, are applicable.
With respect to claim 53, arguments analogous to those presented for claim 5, are applicable.
With respect to claim 63, arguments analogous to those presented for claim 15, are applicable.

Claim 7, 8, 12, 23, 24, 28, 39, 40, 44, 55, 56 and 60 are rejected under 35 U.S.C. 103 as being unpatentable over Aksoy et al. (U.S Publication No. 2020/0302682) (hereafter, "Aksoy") in view of Luo et al. (U.S Publication No. 2021/0248769) (hereafter, "Luo") and further in view of Obla et al. (U.S. Publication No. 2022/0046219) (hereafter, "Obla") and Boisson et al. (U.S. Publication No. 2013/0176388) (hereafter, "Boisson").
Regarding claim 7, the combination of Aksoy, Luo and Obla teaches all the limitations of claim 2 above. Obla teaches wherein the step of determining the corrected image frame based, at least in part, on the input image frame and the predicted disparity value is performed ([0103] The high and low resolution predicted disparity maps 245 and 246, respectively, are output from the respective subnetworks, 11-1 and 11-2, of the CNN module 11 to a disparity-to-range module 6 that is configured to generate an integrated range map 40 from the predicted disparity maps 245 and 246. In an embodiment, the subnetwork 11-1 of the CNN module 11 predicts disparity values from the higher magnification subaperture image 240 and generates the high-resolution disparity map 245 and the subnetwork 11-2 of the CNN module 11 predicts disparity values from lower magnification subaperture image 242 and generates the low-resolution predicted disparity map 246, where disparity values of the disparity maps are representative of the intra-image differences in position of detected objects or features within the respective subaperture image. That is, the high-resolution predicted disparity map 245 is predicted from the subaperture image 240 by the subnetwork 11-1 and is representative of disparity of objects captured in the subaperture image 240 and the low-resolution predicted disparity map 246 is predicted from the subaperture image 242 by the subnetwork 11-2 and is representative of disparity of objects captured in the subaperture image 242).
The combination of Aksoy, Luo and Obla does not expressly teach further comprising: determining an image characteristic of the input image frame is below a threshold level, based on determining the image characteristic is below the threshold level.
However, Boisson teaches further comprising: determining an image characteristic of the input image frame is below a threshold level, based on determining the image characteristic is below the threshold level ([0065] a threshold may be established to define the still zones as the zones where the differences (in absolute value) between two corresponding pixels of the successive images are below the threshold; [0067] Bilateral filtering is done by replacing the luminance value of each pixel of the difference image by a linear combination of the amplitude of luminance of a pixel and the neighboring pixels ... The coefficients of the linear combination depends from the distance between the considered pixel and a neighboring pixel; the farther the pixel, the less the coefficient; thus the luminance of a pixel is more influenced by the pixels that are closer than by the pixels that are farther. The coefficients also depend from the difference (in absolute value) of luminances between the considered pixel and the neighboring pixel; the higher the difference, the less the coefficient; thus the luminance of a pixel is more influenced by pixels that have a similar luminance than by pixels that have a significant difference of luminances).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Aksoy, Luo and Obla to incorporate the step/system of determining the still zone (image characteristic) of images is below a threshold level taught by Boisson.
The suggestion/motivation for doing so would have been to reduce the discomfort such as flickering or jittering, false colors due to undesirable artifacts for disparities of successive images ([0001] The invention relates to three dimensional video imaging, in which at least a left view and a right view of a moving scene are produced and a map of disparities is produced for all pixels of the successive images of a video sequence. The purpose of the invention is to provide an improved manner of associating disparities to successive images; [0012] This variation in the estimations from image to image will result, at the time of reproduction on a display, in artifacts such as flickering or jittering, false colors, etc. They are uncomfortable for the viewer; [0013] The invention aims at reducing at least in part this discomfort due to undesirable artifacts).  Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Aksoy, Luo and Obla with Boisson to obtain the invention as specified in claim 7.
Regarding claim 8, the combination of Aksoy, Luo and Obla teaches all the limitations of claim 2 above. Obla teaches wherein the step of determining the corrected image frame based, at least in part, on the input image frame and the predicted disparity value is performed ([0103] The high and low resolution predicted disparity maps 245 and 246, respectively, are output from the respective subnetworks, 11-1 and 11-2, of the CNN module 11 to a disparity-to-range module 6 that is configured to generate an integrated range map 40 from the predicted disparity maps 245 and 246. In an embodiment, the subnetwork 11-1 of the CNN module 11 predicts disparity values from the higher magnification subaperture image 240 and generates the high-resolution disparity map 245 and the subnetwork 11-2 of the CNN module 11 predicts disparity values from lower magnification subaperture image 242 and generates the low-resolution predicted disparity map 246, where disparity values of the disparity maps are representative of the intra-image differences in position of detected objects or features within the respective subaperture image. That is, the high-resolution predicted disparity map 245 is predicted from the subaperture image 240 by the subnetwork 11-1 and is representative of disparity of objects captured in the subaperture image 240 and the low-resolution predicted disparity map 246 is predicted from the subaperture image 242 by the subnetwork 11-2 and is representative of disparity of objects captured in the subaperture image 242).
The combination of Aksoy, Luo and Obla does not expressly teach further comprising: determining a brightness of the input image frame is below a threshold level, based on determining the brightness is below the threshold level.
However, Boisson teaches further comprising: determining an image characteristic of the input image frame is below a threshold level, based on determining the brightness is below the threshold level ([0065] a threshold may be established to define the still zones as the zones where the differences (in absolute value) between two corresponding pixels of the successive images are below the threshold; [0067] Bilateral filtering is done by replacing the luminance value of each pixel of the difference image by a linear combination of the amplitude of luminance of a pixel and the neighboring pixels ... The coefficients of the linear combination depends from the distance between the considered pixel and a neighboring pixel; the farther the pixel, the less the coefficient; thus the luminance of a pixel is more influenced by the pixels that are closer than by the pixels that are farther. The coefficients also depend from the difference (in absolute value) of luminances between the considered pixel and the neighboring pixel; the higher the difference, the less the coefficient; thus the luminance of a pixel is more influenced by pixels that have a similar luminance than by pixels that have a significant difference of luminances).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Aksoy, Luo and Obla to incorporate the step/system of determining the difference of luminances (image characteristic) of images is below a threshold level taught by Boisson.
The suggestion/motivation for doing so would have been to reduce the discomfort such as flickering or jittering, false colors due to undesirable artifacts for disparities of successive images ([0001] The invention relates to three dimensional video imaging, in which at least a left view and a right view of a moving scene are produced and a map of disparities is produced for all pixels of the successive images of a video sequence. The purpose of the invention is to provide an improved manner of associating disparities to successive images; [0012] This variation in the estimations from image to image will result, at the time of reproduction on a display, in artifacts such as flickering or jittering, false colors, etc. They are uncomfortable for the viewer; [0013] The invention aims at reducing at least in part this discomfort due to undesirable artifacts).  Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Aksoy, Luo and Obla with Boisson to obtain the invention as specified in claim 8.
Regarding claim 12, the combination of Aksoy and Luo teaches all the limitations of claim 1 above. Luo teaches wherein determining the model comprises: storing a plurality of disparity values, wherein the model is based on the plurality of disparity values; and ([0035] The imaging sensors are arranged in a non-linear manner so that the image frames captured using the imaging sensors are displaced along multiple baseline directions (such as horizontally and vertically). As a result, the input image frames have disparities in multiple directions. A machine learning algorithm is applied to the image frames in order to generate multiple disparity maps and multiple confidence maps associated with the disparity maps).
The combination of Aksoy and Luo does not expressly teach replacing a previous value of the plurality of disparity values with the first disparity value based on at least one of a number of values in the plurality of disparity values or a time associated with the previous value.
However, Boisson teaches replacing a previous value of the plurality of disparity values with the first disparity value based on at least one of a number of values in the plurality of disparity values or a time associated with the previous value ([0074] the modified value is derived from a previous disparity value estimated or assigned to a corresponding pixel of the first image of the previous pair; [0076] The map of disparities associated with the current pair of images thus comprises pixels having disparities estimated from the content of the current pair of images, and disparities estimated from at least one previously computed disparity; [0078] Instead of being precisely the previously assigned disparity, the currently assigned disparity may also be a mean value, or alternatively a median value, of several disparities obtained for the considered pixel in a number of previous images).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Aksoy and Luo to incorporate the step/system of modifying disparity value based on several disparities obtained for the considered pixel in a number of previous images taught by Boisson.
The suggestion/motivation for doing so would have been to reduce the discomfort such as flickering or jittering, false colors due to undesirable artifacts for disparities of successive images ([0001] The invention relates to three dimensional video imaging, in which at least a left view and a right view of a moving scene are produced and a map of disparities is produced for all pixels of the successive images of a video sequence. The purpose of the invention is to provide an improved manner of associating disparities to successive images; [0012] This variation in the estimations from image to image will result, at the time of reproduction on a display, in artifacts such as flickering or jittering, false colors, etc. They are uncomfortable for the viewer; [0013] The invention aims at reducing at least in part this discomfort due to undesirable artifacts).  Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Aksoy and Luo with Boisson to obtain the invention as specified in claim 12.
With respect to claim 23, arguments analogous to those presented for claim 7, are applicable.
With respect to claim 24, arguments analogous to those presented for claim 8, are applicable.
With respect to claim 28, arguments analogous to those presented for claim 12, are applicable.
With respect to claim 39, arguments analogous to those presented for claim 7, are applicable.
With respect to claim 40, arguments analogous to those presented for claim 8, are applicable.
With respect to claim 44, arguments analogous to those presented for claim 12, are applicable.
With respect to claim 55, arguments analogous to those presented for claim 7, are applicable.
With respect to claim 56, arguments analogous to those presented for claim 8, are applicable.
With respect to claim 60, arguments analogous to those presented for claim 12, are applicable.

Claim 13, 29, 45 and 61 are rejected under 35 U.S.C. 103 as being unpatentable over Aksoy et al. (U.S Publication No. 2020/0302682) (hereafter, "Aksoy") in view of Luo et al. (U.S Publication No. 2021/0248769) (hereafter, "Luo") and further in view of YOON et al. (U.S. Publication No. 2008/0219655) (hereafter, "YOON").
Regarding claim 13, the combination of Aksoy and Luo teaches all the limitations of claim 1 above. The combination of Aksoy and Luo does not expressly teach wherein the first depth value comprises an auto-focus depth corresponding to a first input image frame captured by the first image sensor.
However, YOON teaches wherein the first depth value comprises an auto-focus depth corresponding to a first input image frame captured by the first image sensor ([0041] The controller 270 identifies a focus position depending on a distance of the camera away from the subject in the autofocus procedure and controls the driver 230 to move the lens system 210 to the identified focus position; [0042] The autofocus procedure performed by the controller 270 includes the following steps (a) through (f)).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Aksoy to incorporate the step/system of performing autofocus procedure corresponding to camera from the subject to be captured taught by YOON.
The suggestion/motivation for doing so would have been to improve the accuracy of the quality of the focusing process ([0003] The present invention relates generally to a camera. More particularly, the present invention relates to an autofocus (AF) method for a camera and a reduction in the time required for the AF to accurately focus on an object; [0020] Accordingly, there is a need in the art for an autofocus method for a camera, by which a quicker autofocus function reduces the inconvenience and discomfort to users but retains or even improves the quality of the focusing process).  Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Aksoy with YOON to obtain the invention as specified in claim 13.
With respect to claim 29, arguments analogous to those presented for claim 13, are applicable.
With respect to claim 45, arguments analogous to those presented for claim 13, are applicable.
With respect to claim 61, arguments analogous to those presented for claim 13, are applicable.

Allowable Subject Matter
Claim 6, 22, 38, 54 and 65 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL C. CHANG whose telephone number is (571)270-1277. The examiner can normally be reached Monday-Thursday and Alternate Fridays 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan S. Park can be reached on (571) 272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL C CHANG/Examiner, Art Unit 2669    
/CHAN S PARK/Supervisory Patent Examiner, Art Unit 2669