Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings filed on 7/13/20 are acceptable subject to correction of the informalities indicated below.  In order to avoid abandonment of this application, correction is required in reply to the Office action.  The correction will not be held in abeyance.
Specifications 0061 state “A reconstruction module 408 reconstructs the image at time t-1 from the motion between time t-1 and time t, the attention map, and the depth (map).” Whereas the Drawings Figure 4 does not show Attention and Depth arrows going into Reconstruction module 408. 
INFORMATION ON HOW TO EFFECT DRAWING CHANGES


Replacement Drawing Sheets

Drawing changes must be made by presenting replacement sheets which incorporate the desired changes and which comply with 37 CFR 1.84.  An explanation of the changes made must be presented either in the drawing amendments section, or remarks, section of the amendment paper.  Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d).  A replacement sheet must include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended.  The figure or figure number of the amended drawing(s) must not be labeled as “amended.”  If the changes to the drawing figure(s) are not accepted by the examiner, applicant will be notified of any required corrective action in the next Office action.  No further drawing submission will be required, unless applicant is notified.
Identifying indicia, if provided, should include the title of the invention, inventor’s name, and application number, or docket number (if any) if an application number has not been assigned to the application. If this information is provided, it must be placed on the front of each sheet and within the top margin. 
Annotated Drawing Sheets
A marked-up copy of any amended drawing figure, including annotations indicating the changes made, may be submitted or required by the examiner.  The annotated drawing sheet(s) must be clearly labeled as “Annotated Sheet” and must be presented in the amendment or remarks section that explains the change(s) to the drawings.
Timing of Corrections
Applicant is required to submit acceptable corrected drawings within the time period set in the Office action. See 37 CFR 1.85(a). Failure to take corrective action within the set period will result in ABANDONMENT of the application. 
If corrected drawings are required in a Notice of Allowability (PTOL-37), the new drawings MUST be filed within the THREE MONTH shortened statutory period set for reply in the “Notice of Allowability.” Extensions of time may NOT be obtained under the provisions of 37 CFR 1.136 for filing the corrected drawings after the mailing of a Notice of Allowability. 

Claim Interpretation – 35 U.S.C. 101
Each of the independent claims recites “determine a first motion between the second time and the first time based on the first pose and the second pose.” which may be reasonably interpreted as a mathematical concept abstract idea. However the additional steps “receive a first image from a first time from a camera; and based on the first image and using the encoder and the decoder, generate a depth map including depths between the camera and objects in the first image” and “generate a first pose of the camera based on the first image; generate a second pose of the camera for a second time based on a second image received from the camera before the first image; and generate a third pose of the camera for a third time based on a third image received from the camera after the first image” integrate the judicial exception mathematical concept abstract idea into a practical application. For example [0043] of the specifications discloses “The present application involves a depth and motion module configured to estimate depth (distance to objects) in images from the camera and to estimate movement of the navigating robot based on the images. The movement may be used, for example, to determine a location of the navigating robot, such as within a building, and to move the navigating robot. The depth may also be used for movement of the robot, such as for object avoidance and/or route planning.” Therefore the claimed invention provides the benefit of object avoidance and route planning by virtue of depth and pose estimation. Accordingly, the additional elements involving depth map generation and multiple pose calculations improve the functioning of the movement technology and thereby integrate the abstract idea into a practical application (MPEP 2106.05a). 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. For example the “first means”, “second means”, and “third means” recited in claim 22 are interpreted as invoking 35 USC 112(f). 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: depth module, pose module, motion module, first reconstruction module, second reconstruction module, and training module in claims 1-19.  
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-6, 18-22 are rejected under 35 U.S.C. 103 as being unpatentable over Dai et al. (Unsupervised learning of depth estimation based on attention model and global pose optimization, July 2019; hereinafter “Dai”) in view of Kang et al. (U.S. PGPub 20200319652, Apr. 2019; hereinafter “Kang”)
Claim 1: Dai teaches A system, comprising: a depth module including an encoder and a decoder [pg285, Col2, Para5 “depth network consists of three parts: Encoder, Attention model and Decoder.” and configured to: receive a first image from a first time from a camera; and based on the first image and using the encoder and the decoder, generate a depth map [pg284, col1, para2 “we focus on depth map estimation of monocular image in this study.” A monocular image is known in the art to be an image received from a single camera] including depths between the camera and objects in the first image [pg290, col2, para2 “The depth map network uses an attention model to preserve the details of the depth map, which enables the network to maintain the shape of objects”. It Is well understood and known in the art that the depth map of an image is the depth between the camera and the objects in the image]; a pose module configured to: generate a first pose of the camera based on the first image; generate a second pose of the camera for a second time based on a second image received from the camera before the first image; and generate a third pose of the camera for a third time based on a third image received from the camera after the first image; [Pg287, Col1, Para6 “Camera pose estimation with geometric constraints: A global pose calculation method, based on simultaneous localization and mapping techniques [38,39], is used to estimate the camera pose in this study. It mainly learns the general representation of the relationship between landmark, geometry and camera pose. GPC method is based on key frames for camera pose tracking and optimization” and Pg287, Col2, Para5&6 “Given the depth of target frame and poses with adjacent frames, a rendered target frame can be reconstructed” The examiner understands the first image, second image and third image to be the same as Dai’s key frames (i.e. images) and adjacent frames (i.e. images). The examiner understands depth network to be the same as the inventor’s depth module, and the global pose calculation method to be the same as the inventor’s pose module, the corresponding structure for these means-plus-function “modules” is interpreted to be the hardware element (see [0109] of the specification of the subject application for examples) with the algorithm that causes the hardware element to perform the claimed function; Dai’s system performs the claimed functions (as mapped above) and requires hardware such as a processor to do so]
Dai does not expressly disclose and a motion module configured to: determine a first motion of the camera between the second time and the first time based on the first pose and the second pose; and determine a second motion of the camera between the second time and the third time based on the second pose and the third pose.
Kang teaches and a motion module configured to: determine a first motion of the camera between the second time and the first time based on the first pose and the second pose; and determine a second motion of the camera between the second time and the third time based on the second pose and the third pose. [0116] “Referring to FIG. 9. an ego-motion information estimating apparatus may calculate short-term ego-motion information SP k,k-1 from a (k-1)th frame image lk-1 and a kth frame image lk based on an ego-motion model 911.” Kang clearly shows in figure 9 the output of the PoseNet module are the poses which are the input of the Final ego-motion module (i.e. Motion module). The examiner interprets Kang’s ego-motion model to be the same as the inventor’s motion module in particular, the corresponding structure for this means-plus function “module” is interpreted to be the hardware element (see [0109] of the specification of the subject application for examples) with the algorithm that causes the hardware element to perform the claimed function. Kang’s ego-motion model performs the claimed functions (as mapped above). Further, [0026-0027] of Kang discloses that the “model” is performed by a processor which is one of example hardware elements listed in [0109] of the specification of the subject application]


    PNG
    media_image1.png
    531
    1110
    media_image1.png
    Greyscale

It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to modify Dai in view of Kang to have a motion module configured to: determine a first motion of the camera between the second time and the first time based on the first pose and the second pose; and determine a second motion of the camera between the second time and the third time based on the second pose and the third pose. The motivation for such a modification would have been to model motion of a vehicle based on input camera images from which depth and pose is determined (See at least [0003] of Kang)”.
Claim 2:  Dai in view of Kang teaches the system of Claim 1.  
Dai teaches the depth map.[pg 286, Figure2, and Pg284, Col2, Para2 “unsupervised learning method to obtain the depth map from the monocular image.”]
Dai does not teach A vehicle, a propulsion device configured to propel the vehicle; and a control module configured to actuate the propulsion device . 
Kang teaches A vehicle, a propulsion device configured to propel the vehicle; and a control module configured to actuate the propulsion device. [0144] “In various corresponding embodiments, when the computing apparatus 1500 is, or is mounted in/on, a mobile user device, a robotic device, or a vehicle, the computing apparatus 1500 is configured to … control autonomous movement of the robotic device or vehicle”. The examiner interprets the inventors control module to be the same as “the computing apparatus configured to control autonomous movement” and the propulsion device configured to propel the vehicle to be the same as “control autonomous movement of the robotic device or vehicle”]
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to modify Dai in view of Kang to have A vehicle, a propulsion device configured to propel the vehicle; and a control module configured to actuate the propulsion device. The motivation for such a modification would have been to model and control the motion of a vehicle based on input camera images from which depth and pose is determined (See at least [0003] of Kang).
Claim 3: Dai in view of Kang teaches the Vehicle in Claim 2.  
Dai teaches includes the camera and does not include any other cameras.[pg 286, Figure2 and Pg284, Col2, Para2 “unsupervised learning method to obtain the depth map from the monocular image.”]
It is known in the art that a monocular image is a product of a single  camera and does not include any other cameras.
Dai does not teach the vehicle
Kang teaches the vehicle [0144] “…control autonomous movement of the robotic device or vehicle”.
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to modify Dai in view of Kang to have the vehicle includes the camera and does not include any other cameras. The motivation for such a modification would have been to improve the range and accuracy of depth as well as reduce the computational resource usage ( see at least pg1,column1, para2 of Dai)
Claim 4: Dai in view of Kang teaches the Vehicle in Claim 2.  
Dai teaches does not include any radars, any sonar sensors, any laser sensors, or any light detection and ranging (LIDAR) sensors.  [Pg284, Col2, Para2 “unsupervised learning method to obtain the depth map from the monocular image.”]
It is known in the art that a monocular image is a product of a single camera only and does not include any other devices such as radars, sonar sensors, laser sensors or any LIDAR sensors.
Dai does not teach the vehicle 
Kang teaches the vehicle [0144] “…control autonomous movement of the robotic device or vehicle”
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to modify Dai in view of Kang to have a vehicle which does not include any radars, any sonar sensors, any laser sensors, or any light detection and ranging (LIDAR) sensors. The motivation for such a modification would have been to reduce the system cost and allow for system flexibility ( see at least pg1,column1, para2 of Dai).
Claim 5: Dai in view of Kang teaches the system in Claim 1.  
Dai does not teach A vehicle, comprising: a propulsion device configured to propel the vehicle; and a control module configured to actuate the propulsion device based on at least one of: the first motion; and the second motion.
Kang teaches A vehicle, comprising: a propulsion device configured to propel the vehicle; and a control module configured to actuate the propulsion device based on at least one of: the first motion; and the second motion. [0144], Figure 16 “In various corresponding embodiments, when the computing apparatus 1500 is, or is mounted in/on, a mobile user device, a robotic device, or a vehicle, the computing apparatus 1500 is configured to estimate a position and a pose of the mobile user device, robotic device, or vehicle based on the final short-term ego-motion information and the final long-term ego-motion information.. and control autonomous movement of the robotic device or vehicle based on the same, as non-limiting examples.” The examiner understands the control module configured to actuate the propulsion device to be the same as “the computing apparatus configured to control autonomous movement” and the propulsion device configured to propel the vehicle to be the same as “control autonomous movement of the robotic device or vehicle” and based on at least one of the first motion to be the same as “based on final short-term ego-motion information” and second motion to be the same as “and the final long-term ego-motion information”.]

    PNG
    media_image2.png
    556
    431
    media_image2.png
    Greyscale

It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to modify Dai in view of Kang to have A vehicle, comprising: a propulsion device configured to propel the vehicle; and a control module configured to actuate the propulsion device based on at least one of: the first motion; and the second motion. The motivation for such a modification would have been to be able to move the vehicle autonomously (See at least [0144] of Kang). 
Claim 6: Dai in view of Kang teaches the system in Claim 1. 
Dai teaches wherein the first, second, and third poses are 6 degree of freedom poses. [Pg287, Para7 “calculates 6-DoF of the camera between two frames”] 
The examiner interprets this to mean that there is 6-DoF between each pair of poses between first and second and between second and third because the poses are determined based on images i.e. frames]
Claim 18:  Dai in view of Kang teaches the system in Claim 1. 
Dai teaches the pose module is configured to generate the first, second, and third poses using a PoseNet algorithm. [Pg287, Col1, Para5 “PoseNet to learn the pose between frames.”].
Claim 19: Dai in view of Kang teaches the system in Claim 1. 
Dai teaches the depth module includes a DispNet encoder- decoder network. [Pg 285, Col2, Para5 “The Details preserved depth network consists of three parts: Encoder, Attention model and Decoder. We modify the DispNet proposed in [11,16] as follows. The attention model is utilized in the encoder and implicitly affects the decoder in order to modify the integral DispNet.”  
Claim 20: The method herein has been executed and performed by the system of Claims 1 and is therefore likewise rejected. 
Claim 21: Independent claim 21 recites a system, comprising: one or more processors; and memory including code that, when executed by the one or more processors, perform functions ([0140] of Kang discloses a processor and a storage device which stores instructions executable by the processor to perform the disclosed algorithm) including those recited in independent claim 1. Accordingly, claim 21 is rejected for reasons analogous to those discussed above in conjunction with claim 1.
Claim 22: Independent claim 22 recites a system, comprising: a first means…; a second means…; and a third means for performing the steps recited in independent claim 1. Accordingly, claim 22 is rejected for reasons analogous to those discussed above in conjunction with claim 1. The corresponding structure for the claimed “first means”, “second means”, and “third means” are analogous to the corresponding structure discussed above in conjunction with claim 1 for the claimed “depth module”, “pose module”, and “motion module”, respectively.
Claims 7-16  are rejected under 35 U.S.C. 103 as being unpatentable over Dai et. al. (Unsupervised learning of depth estimation based on attention model and global pose optimization, July 2019; hereinafter “Dai”)  in view of Kang et. al. (U.S. PGPub 20200319652, Apr. 2019; hereinafter “Kang”) and in view Oktay et. al. (Attention U-Net: Learning Where to Look for the Pancreas, May 2018; cited in the IDS filed 8/10/20 hereinafter “Oktay”) 
Claim 7: Dai in view of Kang teaches the system in Claim 1. 
Dai teaches wherein the depth module includes attention mechanisms configured to, based on the first image, generate an attention map [Pg285, Col2, Para3 “The depth map network uses an attention model to preserve the details of the depth map, which enables the network to maintain the shape of objects and enhance edges of the depth map.”] The examiner understands depth map network to be the same as the inventor’s depth module, the corresponding structure for this means-plus-function “module” is interpreted to be the hardware element (see [0109] of the specification of the subject application for examples) with the algorithm that causes the hardware element to perform the claimed function; Dai’s system performs the claimed functions (as mapped above) and requires hardware such as a processor to do so]
Dai does not teach including attention coefficients indicative of amounts of attention to attribute to the objects in the first image.
Oktay teaches including attention coefficients indicative of amounts of attention to attribute to the objects in the first image. [pg4, Para2 “Attention coefficients identify salient image regions and prune feature responses to preserve only the activations relevant to the specific task”]
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to modify the proposed combination of Dai and Kang to include Oktay’s including attention coefficients indicative of amounts of attention to attribute to the objects in the first image. The motivation for such a modification would have been so that the system can efficiently focus the model’s attention on important regions of the image while suppressing unrelated regions. (See at least Abstract of Oktay).
Claim 8:  Dai in view of Kang and Oktay teaches the system in Claim 7.  
The proposed combination of Dai and Kang does not teach the attention mechanisms include attention gates. 
Oktay does teach the attention mechanisms include attention gates [Pg3, figure1 shows the attention mechanisms includes attention gates] 
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to modify the proposed combination of Dai and Kang to include Oktay’s the attention mechanisms include attention gates. The motivation for such a modification would have been so that the system can efficiently focus the model’s attention on important regions of the image while suppressing unrelated regions. (See at least Abstract of Oktay).

    PNG
    media_image3.png
    458
    763
    media_image3.png
    Greyscale

Claim 9: Dai in view of Kang and Oktay teaches the system in Claim 7.  
The proposed combination of Dai and Kang does not teach the decoder includes the attention mechanism. 
Oktay teaches the decoder includes the attention mechanism [Pg3, figure1 shows the decoder includes the attention mechanism]  
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to modify the proposed combination of Dai and Kang to include Oktay’s the decoder includes the attention mechanism. The motivation for such a modification would have been so that the system can efficiently focus the model’s attention on important regions of the image while suppressing unrelated regions. (See at least Abstract of Oktay).
Claim 10: Dai in view of Kang and Oktay teaches the system in Claim 9.  
The proposed combination of Dai and Kang does not teach the encoder does not include any attention mechanisms.
Oktay teaches the encoder does not include any attention mechanisms. [Pg3, figure1 shows the attention mechanism are in the decoder side of the CNN and not on the encoder side]  
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to modify the proposed combination of Dai and Kang to include Oktay such that the encoder does not include any attention mechanisms. The motivation for such a modification would have been so that the system can efficiently focus the model’s attention on important regions of the image while suppressing unrelated regions. (See at least Abstract of Oktay).
Claim 11: Dai in view of Kang and Oktay teaches the system in Claim 9.  
The proposed combination of Dai and Kang does not teach the decoder includes decoder layers and the attention mechanisms are interleaved with the decoder layers.
Oktay teaches the decoder includes decoder layers and the attention mechanisms are interleaved with the decoder layers. [Pg3, figure1 shows the decoder includes decoder layers and the attention mechanisms are interleaved with the decoder layers.]
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to have modified the proposed combination of Dai and Kang such that the decoder includes decoder layers and the attention mechanisms are interleaved with the decoder layers. The motivation for such a modification would have been so that the system can efficiently focus the model’s attention on important regions of the image while suppressing unrelated regions. (See at least Abstract of Oktay).
Claim 12: Dai in view of Kang and Oktay teaches the system in Claim 7.  
Dai teaches a first reconstruction module configured to reconstruct the second image using the attention map to produce a reconstructed second image; a second reconstruction module configured to reconstruct the third image using the attention map to produce a reconstructed third image; [pg286, Figure1 “The depth map of target view and the relative pose with adjacent views are used to reconstruct the target view,..”] The depth map in Dai contains within it the Attention map [pg285, col2 para3 “The depth map network uses an attention model to preserve the details of the depth map, which enables the network to maintain the shape of objects and enhance edges of the depth map.”]. 
Dai further teaches and a training module configured to, based on at least one of the reconstructed second image and the reconstructed third image, selectively adjust at least one parameter of at least one of depth module, the pose module, and the motion module.  [pg286, Figure1 caption “Training pipeline of unsupervised depth estimation… the reconstruct photometric error is used as loss function to optimize the depth map”.  
The examiner interprets the claimed training module to be the same as the “Training pipeline” in particular, the corresponding structure for this means-plus function “module” is interpreted to be the hardware element (see [0109] of the specification of the subject application for examples) with the algorithm that causes the hardware element to perform the claimed function. Dai’s training pipeline performs the claimed functions (as mapped above). Further, Dai discloses that the “module” is performed by a processor which is one of example hardware elements listed in [0109] of the specification of the subject application.
The examiner interprets the claimed based on at least one of the reconstructed second image and the reconstructed third image to be the same as “reconstruct photometric error is used as loss function”. 
The examiner interprets the claimed selectively adjust at least one parameter of at least one of depth module to be the same as “optimize the depth map”.

    PNG
    media_image4.png
    249
    1004
    media_image4.png
    Greyscale

Claim 13:  The proposed combination of Dai, Kang and Oktay teaches the system in Claim 12. 
Dai teaches the training module [Pg286, figure 1 “Training pipeline…] is configured to selectively adjust the at least one parameter based on the reconstructed second image, the reconstructed third image, the second image, and the third image. [Pg286, figure 1 “…the relative pose with adjacent views are used to reconstruct the target view, the reconstruct photometric error is used as loss function to optimize the depth map”]

    PNG
    media_image4.png
    249
    1004
    media_image4.png
    Greyscale

Claim 14: The proposed combination of Dai, Kang and Oktay teaches the system in Claim 13. 
Dai teaches the training module [Pg286, figure 1 “Training pipeline…] is configured to selectively adjust the at least one parameter based on: a first difference between the reconstructed second image and the second image; and a second difference between the reconstructed third image and the third image. [Pg288, Col1, Para1 “Suppose that 𝐹𝑟𝑖 is the reconstructed image while 𝐹𝑜𝑖 represents the original image. The objective function of the view reconstruction process 𝐿𝑣𝑟 can be expressed as: 𝐿𝑣𝑟 = 𝛴⟨𝐹1,𝐹2,…,𝐹𝑛⟩𝛴𝑝𝑙|𝐹𝑜𝑖(𝑝𝑙) − 𝐹𝑟𝑖(𝑝𝑙)|,where 𝑝𝑙 is the pixel.”]. The examiner interprets the original image of the respective adjacent frames to be one of the claimed second image or third image. 
Claim 15: The proposed combination of Dai, Kang and Oktay teaches the system in Claim 12. 
Dai teaches the training module is configured to jointly train the depth module, the pose module [ pg286, figure1 “Training pipeline of unsupervised depth estimation, which consists of two streams, the depth estimation stream and pose estimation stream…”  The examiner interprets the claimed depth module, the pose module to be the same as the “the depth estimation stream and pose estimation stream”, The examiner understands depth estimation stream to be the same as the inventor’s depth module, and the pose estimation stream to be the same as the inventor’s pose module, the corresponding structure for this means-plus-function “module” is interpreted to be the hardware element (see [0109] of the specification of the subject application for examples) with the algorithm that causes the hardware element to perform the claimed function; Dai’s system performs the claimed functions (as mapped above) and requires hardware such as a processor to do so]
Dai does not teach train the motion module. 
Kang does teach train the motion module [0024] “training any one or any combination of any two or more of the ego-motion model, the attention model, and the depth model based on a loss calculated from the warped image and a current frame image among the training images” The examiner interprets the claimed motion module to be the same as the “ego-motion model” the corresponding structure for this means-plus-function “module” is interpreted to be the hardware element (see [0109] of the specification of the subject application for examples) with the algorithm that causes the hardware element to perform the claimed function; Kang’s system performs the claimed functions (as mapped above) and requires hardware such as a processor to do so] Kang’s ego-motion model performs the claimed functions (as mapped above). Further, Kang discloses that the “model” is performed by a processor which is one of example hardware elements listed in [0109] of the specification of the subject application]
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to have modified the proposed combination of Dai and Oktay in view of Kang to have a training module is configured to jointly train the depth module, the pose module, and the motion module. The motivation for such a modification would have been to train and control the motion of a vehicle based on input camera images from which depth and pose is determined (See at least [0003] of Kang).
Claim 16: The proposed combination of Dai, Kang, and Oktay teaches the system in Claim 12. 
Dai teaches the first reconstruction module is configured to reconstruct the second image using the attention map; and the second reconstruction module is configured to reconstruct the third image using the attention map. [pg286, Figure1 “The depth map of target view and the relative pose with adjacent views are used to reconstruct the target view,..”] The depth map in Dai contains within it the Attention map [pg285, col2 para3 “The depth map network uses an attention model to preserve the details of the depth map, which enables the network to maintain the shape of objects and enhance edges of the depth map.”]. 
    PNG
    media_image4.png
    249
    1004
    media_image4.png
    Greyscale

Dai does not teach an image warping algorithm
Kang does teach an image warping algorithm: [[0024] “training any one or any combination of any two or more of the ego-motion model, the attention model, and the depth model based on a loss calculated from the warped image and a current frame image among the training images”.] The image warping algorithm is further detailed claim 20: “wherein the generating of the warped image preferably comprises: generating a three-dimensional (3D) coordinate image corresponding to the previous frame from the temporary depth information; restoring a 3D coordinate image corresponding to a current frame by converting the 3D coordinate image corresponding to the previous frame using the temporary short-term ego-motion information calculated from the temporary long-term ego-motion information; and generating the warped image by projecting the 3D coordinate image corresponding to the current frame two-dimensionally such that the warped image is two dimensionally warped.”]
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to have modified the proposed combination of Dai and Oktay in view of Kang to have an image warping algorithm.  The motivation for such a modification would have been to train the system by calculating a loss from the warped image (See at least [0108] of Kang).

Claim(s) 17  are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et. al. (Unsupervised Learning of Depth and Ego-Motion from Video, 2017; cited in the IDS filed 8/10/20; hereinafter “Zhou”)
Claim 17: Dai in view of Kang and Oktay teaches the system in Claim 16. 
The proposed combination of Dai, Kang and Oktay do not teach the image warping algorithm includes an inverse image warping algorithm. 
Zhou does teach the image warping algorithm includes an inverse image warping algorithm.  [Pg 6614, Figure2 caption “The outputs of both networks are then used to inverse warp the source views (see Sec. 3.2) to reconstruct the target view”]
It would have been obvious to persons of ordinary skill in the art before the effective filing date of the invention to have modified the proposed combination of Dai, Kang and Oktay in view of Zhou to have the image warping algorithm includes an inverse image warping algorithm.   The motivation for such a modification would have been to be able to train the system in an unsupervised manner (see at least caption of figure 2 of Zhou). 
Conclusion
The prior art made of record and not relied upon is considered pertinent to
applicant's disclosure:
Xiao et. al. Method for Calibrating a Multi-Sensor system using an Artificial Neural Network. Patent Application Publication number US2020/0378793 A1 , Filed May 20, 2020. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OWAIS MEMON whose telephone number is (571)272-2168. The examiner can normally be reached M-F (7:00am - 4:30pm) EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Justin Mikowski can be reached on (571) 272-8525. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/OWAIS IQBAL MEMON/Examiner, Art Unit 4184                                                                                                                                                                                                        
/SEAN M CONNER/Primary Examiner, Art Unit 2663