DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
         The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
     The information disclosure statement (IDS) submitted on 09/06/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
     Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
     The abstract of the disclosure is objected to because it contains 332 words in length.   Correction is required.  See MPEP § 608.01(b).

Claim Rejections - 35 USC § 112
     The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


           The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

     Claim(s) 1 - 20 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1, line 4 recites “the same scene”. However, such “a same scene has not been introduced in the preamble. There is insufficient antecedent basis for this limitation in the claim. 
     Examiner notes that:
Claims 2 – 3, 5 – 8, 10 – 13, 15 – 20 are all dependent on claim 1;
Claims 4, 9 and 14 are respectively dependent on claims 3, 8 and 13, and
Considering that claim 1 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite – see above rejection,


Claim Rejections - 35 USC § 103
     In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
     The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

     Claim(s) 1 - 22 is/are rejected under 35 U.S.C. 103 as being obvious over Schmid et al. (US 2020/0349722 A1) in view of Angelova et al. ( US 2019/0279383 A1).
     The applied references have a common applicant with the instant application. Based upon the earlier effectively filed date of the reference, they constitute prior arts under 35 U.S.C. 102(a)(2). 
      Regarding claim 1, Schmid discloses a system comprising one or more computers and one or more non-transitory storage devices storing instructions (see abstract and see paragraph 0017) that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
receiving a sequence of input images that depict the same scene, the input images being captured by a camera at different time steps, the sequence of input images comprising a current input image and one or more input images preceding the current image in the sequence (see lines 1-6 of paragraph 0019 on page.3, wherein Schmid discloses a consecutive images in a sequence of images; also see the first image 212 in Fig.2 which is equivalent to the current input image) ;
processing the current input image to generate a segmentation map for potential objects in the current input image (see the term “segmentation masks” on element 126 of Fig. 1; and see lines 1 – 7 of paragraph 0025, page 23) and a respective depth map for the current input image (see the term “depth map” on element 128 of Fig. 1; and see paragraph 0023);
generating a point cloud for the current input image using the segmentation map and the depth map of the current input image, wherein the point cloud is a 3-dimensional (3D) structure representation of the scene as depicted in the current input image (see element 130 in Fig. 1; also see line 3 – 9 of paragraph 0031)
processing the sequence of input images using an ego-motion estimation neural network to generate, for each pair of two consecutive input images in the sequence (see the Camera Motion NN layers 102 corresponding to an ego-motion neural network; and see paragraph 0027), a respective ego-motion output that characterizes motion of the camera between the two consecutive 
processing the point cloud of the current input image and the future ego-motion output to generate a future point cloud that is a predicted 3D representation of a future scene as depicted in the future image (see the Final Point Cloud 232 of Fig. 2; and see lines 10 – 12 of paragraph 0049); and
processing the future point cloud to generate a predicted segmentation map for potential objects in the future scene depicted in the future image (see Optical Flow 120 of Fig. 1 corresponding to the predicted segmentation map). 
           Schmid does not explicitly teach the step of processing the ego-motion outputs using a future ego-motion prediction neural network to generate a future ego-motion output that is a prediction of future motion of the camera from the current input image in the sequence to a future image, wherein the future image is an image that would be captured by the camera at a future time step.
           However, Angelova teaches the step of processing the ego-motion outputs using a future ego-motion prediction neural network to generate a future ego-motion output that is a prediction of future motion of the camera from the current input image in the sequence to a future image, wherein the future image is an image that would be captured by the camera at a future time step (see the terms “future images”, “hypothetical camera motion” and “multiple time steps ahead” in lines 1 – 7 of paragraph 0039, wherein Angelova discloses a system that predicts future depth maps of future images by using hypothetical camera motion or future motion of the camera at time steps ahead or future time step).

           The motivation of doing so would have been to accurately estimate and model the future camera motion for future image segmentations prediction.
     Regarding claim 2, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Schmid further discloses the step of the sequence of input images comprises the current input image and three or more images preceding the current input image in the sequence (see lines 1 – 4 of paragraph 0019, wherein Schmid discloses that he sequence of input images may comprises a first image as a current input image and a plurality of other images).
     Regarding claim 3, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Schmid discloses the step of the sequence of input images are frames of a video captured by the camera (see lines 5 – 6 of paragraph 0019; Schmid discloses that frames of a video are taken by a camera) wherein the frames of the video are separated by a fixed number of time intervals, wherein the fixed number is greater than one (see paragraph 0054, wherein Schmid discloses It+1 with t+1as the fixed number of time intervals greater than one).
     Regarding claim 4, the combination of Schmid in view of Angelova teaches the system of claim 3, wherein Schmid further discloses the step of the fixed number of time intervals includes three time intervals (see the term “every other frame” in line 6 
     Regarding claim 5, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Angelova further discloses the step of the future time step is three time steps in the future with respect to the time step at which the current input image is captured (see the term “time steps t ahead” in lines 2-3 of paragraph 0039; equivalent to future time step. One of ordinary skilled in the art could set three or more time steps in the future).
     Regarding claim 6, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Schmid also discloses the step of the operations for processing the current input image to generate the segmentation map for potential objects in the current input image and the respective depth map for the current input image comprises: processing the current input image using a segmentation neural network to generate the segmentation map for potential objects in the current input image (see block 114 of Fig. 1 ; and see paragraph 0024), and processing the current input image using a depth estimation neural network to generate the depth map for the current input image (see block 116 of Fig. 1; and see paragraph 0023).
     Regarding claim 7, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Schmid discloses the step of processing the future point cloud to generate the predicted segmentation map for potential objects in the future scene depicted in the future image further comprises: projecting the future point cloud to a two-dimensional plane to obtain projected points in the plane, wherein the projected points form the predicted segmentation map for potential objects in the 
     Angelova teaches the step of wherein the two-dimensional plane is at a predetermined distance from the camera and is orthogonal to principal axis of the camera (see the term “orthogonal to the camera’s principal axis” in lines 23 – 25 of paragraph 0038). 
     It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the method of Schmid to incorporate the teachings of predetermined distance and orthogonality to the camera principal axis as taught by Angelova. 
     The motivation of doing so would have been to update the depth value of each projected point in the plane based on a respective newly-calculated distance from its corresponding 3D point in the point cloud to the plane.
     Regarding claim 8, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Angelova discloses the step of the future ego-motion prediction neural network (corresponding to the image generation subsystem 104 of Fig. 1) is a recurrent neural network (see the term “recurrent neural network” on paragraph 0031) that is configured to receive as input the plurality of ego-motion outputs and to generate the future ego-motion output that is the prediction of future motion of the camera from the current input image in the sequence to the future 
     Regarding claim 9, the combination of Schmid in view of Angelova teaches the system of claim 8, wherein Angelova further discloses the step of the recurrent neural network includes a plurality of Long Short-Term Memory (LSTM) neural network layers (see lines 4 – 8 of paragraph 0031).
     Regarding claim 10, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Schmid discloses the step of the future ego-motion prediction neural network has been trained using an unsupervised learning technique (see the term “unsupervised” in paragraph 0099, wherein Schmid discloses that the neural networks of the system may use unsupervised training).
     Regarding claim 11, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Schmid also discloses the step of the segmentation map of the current input image comprises, for each pixel of a plurality of pixels in the current input image, an estimated probability distribution over a predefined number of object classes that represents, for each predefined object class, a respective probability that the pixel belongs to the predefined object class (see lines 4 – 11 of paragraph 0025).
     Regarding claim 12, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Schmid discloses the step of the depth map comprises an estimated depth value for each pixel of a plurality of pixels in the current input image that represents a respective distance of a scene depicted at the pixel from a focal plane of the current input image (see lines 6 – 11 of paragraph 0023).
     Regarding claim 13, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Schmid further discloses the step of for each pair of input images, the respective ego-motion output characterizes motion of the camera between the two input images is an ego-motion vector that defines rotation and translation of the camera from its point of view while taking one input image to its point of view while taking the other input image (see the term “rotation and translation” in lines 1 – 4 of paragraph 0028).
     Regarding claim 14, the combination of Schmid in view of Angelova teaches the system of claim 13, wherein Angelova discloses the step of the ego-motion vector includes three values for three translation components and three values for three rotation components (see Angelova lines 17 – 18 of paragraph 0038; with tx, ty, tz as translation components and rx, ry, rz as rotation components).
     Regarding claim 15, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Angelova further discloses the step of the future ego-motion output is a predicted ego-motion vector that would define rotation and translation of the camera from its point of view while taking the current input image to its predicted point of view while taking the future image (see lines 8 – 18 of paragraph 0038, wherein Angelova discloses a camera pose vector Pi used to represent the position and orientation of the camera for new coordinates and orientation prediction of the camera). 
     Regarding claim 16, the combination of Schmid in view of Angelova teaches the system of claim, wherein Schmid discloses the step of the segmentation neural 
     Regarding claim 17, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Schmid further discloses the step of the depth estimation neural network comprises one or more convolutional neural network layers (see lines 12 – 14 of paragraph 0023).
     Regarding claim 18, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Schmid also discloses the step of the ego-motion estimation neural network comprises one or more convolutional neural network layers (see lines 8 – 10 of paragraph 0036, wherein Schmid discloses a motion mask encoder subnetwork that includes a 3x3 convolutional neural networks).
     Regarding claim 19, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Schmid discloses the step of the ego-motion estimation neural network and the depth prediction neural network have been jointly trained using an unsupervised learning technique (see lines 4 – 11, right column of paragraph 0008).
     Regarding claim 20, the combination of Schmid in view of Angelova teaches the system of claim 1, wherein Schmid discloses the step of the ego-motion estimation neural network, the depth prediction neural network, and the segmentation neural network have been jointly trained using an unsupervised learning technique (see the last 9 lines of paragraph 0005).
     Regarding claim 21, Examiner notes that the claim recites one or more non-transitory computer storage media encoded with instructions that, when executed by 
     The combination of Schmid in view of Angelova teaches the system according to claim 1 – see rejection above. 
     Accordingly, claim 21 is also rejected under 35 U.S.C. 103 as being obvious over Schmid in view of Angelova.
     Regarding claim 22, Examiner notes that the claim recites a computer-implemented method comprising the limitations according to claim 1.
     The combination of Schmid in view of Angelova teaches the system according to claim 1 – see rejection above. 
     Accordingly, claim 22 is also rejected under 35 U.S.C. 103 as being obvious over Schmid in view of Angelova.
     This rejection under 35 U.S.C. 103 might be overcome by: (1) a showing under 37 CFR 1.130(a) that the subject matter disclosed in the reference was obtained directly or indirectly from the inventor or a joint inventor of this application and is thus not prior art in accordance with 35 U.S.C.102(b)(2)(A); (2) a showing under 37 CFR 1.130(b) of a prior public disclosure under 35 U.S.C. 102(b)(2)(B); or (3) a statement pursuant to 35 U.S.C. 102(b)(2)(C) establishing that, not later than the effective filing date of the claimed invention, the subject matter disclosed and the claimed invention were either owned by the same person or subject to an obligation of assignment to the same person or subject to a joint research agreement. See generally MPEP § 717.02.
 
Conclusion
     The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
     Marrero et al. (US 2020/0302612 A1) discloses a system for producing more accurate boundary information when the shape of the data is used with semantic segmentation data.
     Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLAUDE NOEL Y. ZANETSIE whose telephone number is (571)272-4663.  The examiner can normally be reached on Monday - Friday 8:00 am - 4:00 pm.
 Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
 If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAN PARK can be reached on (571) 272-7409.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
 Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private 






/CLAUDE NOEL Y ZANETSIE/Examiner, Art Unit 2669                                                                                                                                                                                                        
/CHAN S PARK/Supervisory Patent Examiner, Art Unit 2669