Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
35 USC § 112
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a depth system…to generate” & “an image module…to generate…”
Since the claim limitation(s) invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, Claims 1-7 has/have been interpreted to cover the corresponding structure described in the specification that achieves the claimed function, and equivalents thereof.  
A review of the specification shows that the following appears to be the corresponding structure described in the specification for the 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph limitation: 
See Fig. 2, and [0023], wherein the depth system 108 includes processor and memory for processing the system, including a software image module 230.

If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Objections
Claim 1 is objected to because of the following informalities:  
Claim 1 claims, “…a depth system including instructions that when executed by the one or more processors cause the on or more processors to, in response to, to generate a plurality of depth maps by,” and appears to be grammatically incorrect or missing language.  Examiner has interpreted the claim as, “…executed by the one or more processors to generate a plurality of depth maps by,” for purpose of examination.
Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 3-8, 10-15, 17-20  is/are rejected under 35 U.S.C. 103 as being unpatentable over Shi et al. “Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjustment”) in view of Cricri (US 20190311259) and Pradeep et al. (“MonoFusion: Real-time 3D Reconstruction of small Scenes with a Single Web Camera”).
Re claim 1, Shi teaches a system for generating depth estimates of an environment, comprising: 
one or more processors; a memory communicably coupled to the one or more processors and storing: (see p. 1, introduction, computer vision research, where computer vision inherently requires a computer processor and memory including instructions executed by a computer processor)
a depth system including instructions that when executed by the one or more processors cause the one or more processors to generate a plurality of depth maps by: (see p. 3, 3. Our Approach, in reference to Fig. 2, wherein a dense depth map D is first generated for the target view) and (see p. 4, in reference to equation 1, wherein for an input image i, a dense depth map Di is created).
receiving a plurality of monocular images, each of the plurality of monocular images capturing substantially the same environment (see p. 1, Abstract, wherein approach only needs unlabeled monocular videos (wherein a video is interpreted as a plurality of images in sequence), (see Figure 2, wherein the pipeline of the system takes a sequence of images, interpreted to be a plurality of target view It), (see p. 3, in reference to Fig. 2, wherein a CNN is used to generate dense per-pixel depth map for the target view image using the unlabeled monocular videos), and (see Fig. 5, wherein input images include substantially the same environment of street/road images).
processing each of the monocular images according to a depth model (see p. 3, in reference to Fig. 2, wherein a depth CNN processes each target view It) and (see Fig. 2, wherein a depth map is generated for each input image).
Shi does not explicitly teach wherein the depth model is an overfit depth model.
However, Cricri teaches wherein the depth model is an overfit depth model ([0042] In the simplest case, video content is a three-dimensional data type, of which two dimensions are spatial and a third dimension is temporal, thus overfitting can occur in any combination of these dimensions. Here, for the sake of simplicity, we will consider only the temporal dimension for overfitting. However, video may also have other dimensions, such as the depth or distance dimension, which represents the distance of captured points in the scene from the capturing device. The neural network may be overfitted to one or more portions of such video data, for example, a temporal portion of a video, a spatial portion of one or more video frames, or a depth map), and (see [0032-0034], [0041]) wherein a neural network is overfit onto target content and sent to a destination device with the target content in order to obtain a better performing neural network for content processing…the content in question may be a video, wherein at least a first and second portion of the video content is overfitted y the neural network).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Shi’s depth system including processing a plurality of images according to a CNN depth model to explicitly include a overfit depth model, as taught by Cricri, as the references are in the analogous art of sending media content data to be trained by a neural network.  An advantage of the modification is that it achieves the result of using overfitting for a neural network to perform well on a selected set of content to process, such as enhancement of video content (see Cricri [0025-0033]).
Shi and Cricri do not explicitly teach an image module including instructions that when executed by the one or more processors cause the one or more processors to generate a three-dimensional reconstruction of the environment based on the plurality of depth maps.
However, Pradeep teaches an image module including instructions that when executed by the one or more processors cause the one or more processors to generate a three-dimensional reconstruction of the environment based on the plurality of depth maps (see p. 83, Abstract, in reference to Fig. 1, wherein a dense 3D reconstruction of the environment is built from a single camera, by estimating pose and using dense depth maps to directly fuse into a voxel-based implicit model), (see p. 84, 2. System Overview, wherein a live stream of images of a scene from a single moving camera is used to generate and maintain a live, dense 3D reconstruction of the scene…pair of frames and camera poses are used to compute a dense depth map for the current frame…dense depth maps are computed and integrated into a voxel-based representation of the scene by applying volumetric fusion).
Shi, Cricri, and Pradeep teaches claim 1.  It would have been obvious to one of ordinary skill in the art at the time of filing to modify Shi and Cricri’s depth system to explicitly include 3d reconstruction using depth maps, as taught by Pradeep, as the references are in the analogous art of using input images for generating depth maps for further processing.  An advantage of the modification is that it achieves the result of explicitly using depth maps for 3d reconstruction of the environment based on the 2d images inputted.
Re claim 3, Shi, Cricri, and Pradeep teaches claim 1.  Furthermore, Cricri teaches wherein the overfit depth model comprises a convolution neural network (see abstract, [0006-0007], train a neural network to be overfitted to at least a first portion of media content) and (see [0083-0086], in reference to Fig. 2, wherein deep learning techniques include CNN, which are highly attractive in text, image, video, and speech applications, wherein the input to a CNN is an image or other media content such as video file).  For motivation, see claim 1.
Re claim 4, Shi, Cricri, and Pradeep teaches claim 1.  Furthermore, Shi teaches wherein the overfit depth model comprises a pose network (see p. 3, in reference to Fig. 2, wherein a Pose CNN takes a pair of images as inputs and outputs relative poses).
Re claim 5, Shi, Cricri, and Pradeep teaches claim 4.  Furthermore, Shi teaches wherein the pose network outputs a relative pose between a pair of monocular images (see p. 3, in reference to Fig. 2, wherein a Pose CNN takes a pair of monocular images as inputs and outputs relative poses.
Re claim 6, Shi, Cricri, and Pradeep teaches claim 1.  Furthermore, Shi teaches wherein the overfit depth model comprises a depth network (see p. 3, in reference to Fig. 2, wherein a depth CNN is a depth network).
Re claim 7, Shi, Cricri, and Pradeep teaches claim 6.  Furthermore, Pradeep teaches wherein the depth network projects 2d pixels into 3d points (see p. 85, 4 Depth Estimation, wherein estimating depth is based on stereo matching across live image and a previously selected key frame.  Stereo matching is the task of finding corresponding pixels between two images taken from different but known viewpoints.  Wherein system starts generating per-pixel depth frames for every new incoming frame in the video stream I…the goal is to search for a depth value for each pixel I = (u,v) in image I that has minimal costs among all possible depth values and “Let Ip be a square patch in image I centered at pixel p and I’p the projection of this patch into image I’ according to depth d, and convert pixel I into 3D scene point x according to depth d).  For motivation, see claim 1.
Claims 8 and 15 claims limitations in scope to claim 1, and is rejected for at least the reasons above.
Claims 10-14 claims limitations in scope to claim 3-7 and is rejected for at least the reasons above.
Claims 17-20 claims limitations in scope to claim 3-6 and is rejected for at least the reasons above.

Claim(s) 2, 9, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shi et al. “Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjustment”) in view of Cricri (US 20190311259) and Pradeep et al. (“MonoFusion: Real-time 3D Reconstruction of small Scenes with a Single Web Camera”) and Liu et al. (US 20200193623).
Re claim 2, Shi, Cricri, and Pradeep teaches claim 1.  Shi, Cricri, and Pradeep do not explicitly teach wherein the depth system comprises instructions that when executed by the one or more processors cause the one or more processors to filter the plurality of depth maps to remove non-static objects.
However, Liu teaches the depth system comprises instructions that when executed by the one or more processors cause the one or more processors to filter the plurality of depth maps to remove non-static objects (see [0080-0081], wherein a non-static object is extracted, and removed the non-static object region from the global sparse depth map).
Shi, Cricri, Pradeep, and Liu teaches claim 2.  It would have been obvious to one of ordinary skill in the art at the time of filing to modify Shi, Cricri, and Pradeep’s depth system of generating depth maps to explicitly include filtering the plurality of depth maps to remove non-static objects, as taught by Liu, as the references are in the analogous art of generation of depth maps from image data.  An advantage of the modification is that it achieves the result of using a filter on the depth map to extract non-static objects to improve accuracy (see Liu [0078-0079]).
Claim 9 claims limitations in scope to claim 2 and is rejected for at least the reasons above.
Claim 16 claims limitations in scope to claim 2 and is rejected for at least the reasons above.
Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Peter Hoang whose telephone number is (571)270-1346. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PETER HOANG/             Primary Examiner, Art Unit 2616