DETAILED ACTION
This communication is in response to the claims filed on 12/03/2019. 
Application No: 16/701,515.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Reasons for allowance
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance: 
The reason for allowance is that the prior arts of record fail to teach the limitations along with preamble as a whole claim. The limitations recited in the independent claims comprise a particular combination of elements, functions and preamble, which are neither taught nor-suggested by the prior arts as a whole claim. 


The representative claim 1 distinguish features are underlined and summarized below: 
A depth system for semi-supervised training of a depth model for monocular depth estimation, comprising: 
one or more processors; a memory communicably coupled to the one or more processors and storing: a training module including instructions that when executed by the one or more processors cause the one or more processors to:
 	train the depth model according to a first stage that is self-supervised and that includes using first training data that comprises pairs of training images, 
wherein respective ones of the pairs include separate frames depicting a scene from a monocular video, wherein the training module includes instructions to produce first stage loss values that update the depth model and a pose model, 
wherein the pose model facilitates the first stage according to a structure from motion (SIM) process, and train the depth model according to a second stage that is weakly supervised and that includes using second training data to produce depth maps according to the depth model, 
wherein the second training data comprising individual images with corresponding sparse depth data, wherein the training module includes instructions to produce second stage loss values that are based, at least in part, on the depth maps and the depth data; and 
a network module including instructions that when executed by the one or more processors cause the one or more processors to provide the depth model to infer distances from monocular images in a device.


The representative claim 9 distinguish features are underlined and summarized below: 
 Anon-transitory computer-readable medium for semi-supervised training of a depth model for monocular depth estimation and including instructions that when executed by one or more processors cause the one or more processors to:
train the depth model according to a first stage that is self-supervised and that includes using first training data that comprises pairs of training images,
 wherein respective ones of the pairs include separate frames depicting a scene from a monocular video, wherein the instructions include instructions to produce first stage loss values that update the depth model and a pose model,
wherein the pose model facilitates the first stage according to a structure from motion (SIM) process;
train the depth model according to a second stage that is weakly supervised and that includes using second training data to produce depth maps according to the depth model, 
wherein the second training data comprises individual images with corresponding sparse depth data, wherein the instructions to train according to the second stage include instructions to produce second stage loss values that are based, at least in part, on the depth maps and the depth data; and
provide the depth model to infer distances from monocular images in a device.


claim 13 distinguish features are underlined and summarized below:
 A method of semi-supervised training of a depth model for monocular depth estimation, comprising:
training the depth model according to a first stage that is self-supervised and that includes using first training data that comprises pairs of training images,
 wherein respective ones of the pairs including separate frames depicting a scene of a monocular video, wherein the pairs of training images provide for producing first stage loss values to update the depth model and a pose model, 
wherein the pose model facilitates the first stage according to a structure from motion (SIM) process;
training the depth model according to a second stage that is weakly supervised and that includes using second training data to produce depth maps according to the depth model, the second training data comprising individual images with corresponding sparse depth data, the second training data providing for updating the depth model according to second stage loss values that are based, at least in part, on the depth maps and the depth data; and
providing the depth model to infer distances from monocular images in a device.


Applicant's independent claim 1 comprises a particular combination of underlined features in combination with other recited limitations, which are neither taught nor-suggested by the prior arts as a whole claim. 
Similarly, other independent claims 9 and 13 comprises a particular combination of underlined features in combination with other recited limitations with analogous wording, which are neither taught nor-suggested by the prior arts as a whole claim.
Dependent claims are deemed allowable for the same reasons as corresponding independent claims.
 

Prior Art References 
The closest combined references of VENDAS, Abhiram and Weston teaches following:
 	VENDAS (US 20210304430 A1) teaches methods for simultaneous localization and mapping from video with adversarial shape prior learning in real-time. For example, an unsupervised direct and dense SLAM may learn a geometry prior from data. Given a video sequence, a depth map of a target frame, as well as the target frame and the camera motions between the target frame and all the remaining frames may be output. Further, by fusing a camera motion estimate with a positional sensor's output, positional drift and the need for loop closure can be avoided.

Abhiram (US 20190287297 A1) teaches a machine-readable media for determining a three-dimensional environment model of the environment of one or more camera devices, in which image processing for inferring the model may be performed at the camera devices.

Weston (US 20090204558 A1) teaches a method for training a learning machine having a deep network with a plurality of layers, includes applying a regularizer to one or more of the layers of the deep network; training the regularizer with unlabeled data; and training the deep network with labeled data. Also, an apparatus for use in discriminative classification and regression, including an input device for inputting unlabeled and labeled data associated with a phenomenon of interest; a processor; and a memory communicating with the processor. The memory includes instructions executable by the processor for implementing a learning machine having a deep network structure and training the learning machine by applying a regularizer to one or more of the layers of the deep network; training the regularizer with unlabeled data; and training the deep network with labeled data.

However cited references, alone or in any combination, neither discloses nor fairly suggests combination of features specifically listed above and/or underlined, in particular,
wherein respective ones of the pairs include separate frames depicting a scene from a monocular video, wherein the training module includes instructions to produce first stage loss values that update the depth model and a pose model, 
wherein the pose model facilitates the first stage according to a structure from motion (SIM) process, and train the depth model according to a second stage that is weakly supervised and that includes using second training data to produce depth maps according to the depth model, 
wherein the second training data comprising individual images with corresponding sparse depth data, wherein the training module includes instructions to produce second stage loss values that are based, at least in part, on the depth maps and the depth data; and 
a network module including instructions that when executed by the one or more processors cause the one or more processors to provide the depth model to infer distances from monocular images in a device.
 
VENDAS teaches methods for simultaneous localization and mapping from video with adversarial shape prior learning in real-time; but failed to teach one or more limitations including, 
wherein respective ones of the pairs include separate frames depicting a scene from a monocular video, wherein the training module includes instructions to produce first stage loss values that update the depth model and a pose model, 
wherein the pose model facilitates the first stage according to a structure from motion (SIM) process, and train the depth model according to a second stage that is weakly supervised and that includes using second training data to produce depth maps according to the depth model, 
wherein the second training data comprising individual images with corresponding sparse depth data, wherein the training module includes instructions to produce second stage loss values that are based, at least in part, on the depth maps and the depth data; and 
a network module including instructions that when executed by the one or more processors cause the one or more processors to provide the depth model to infer distances from monocular images in a device.

Abhiram and Weston alone or in combination failed to cure the deficiency of VENDAS.

	 Thus, the cited references, alone or in combination, fail to disclose or suggest each of the elements recited by the independent claims.


The present invention provides an improved method for training machine learning algorithms to determine depths of a scene from a monocular image. Further, various devices that operate autonomously or that provide information about a surrounding environment use sensors that facilitate perceiving obstacles and additional aspects of the surrounding environment. For example, a robotic device may use information from the sensors to develop an awareness of the surrounding environment in order to navigate through the environment. In particular, the robotic device uses the perceived information to determine a 3-D structure of the environment in order to identify navigable regions and avoid potential hazards. The ability to perceive distances through estimation of depth using sensor data provides the robotic device with the ability to plan movements through the environment and generally improve situational awareness about the environment. However, depending on the available onboard sensors, the robotic device may acquire a limited perspective of the environment, and, thus, may encounter difficulties in distinguishing between aspects of the environment. The present invention cures the above problems, by, the semi-supervised training with weak supervision improves the ability of the depth model to accurately infer depths without using extensively annotated training data from more complex sensors.

Therefore, when taken as a whole application, and incorporating all the respective limitations, none of the prior art discloses the features as claimed.

Conclusion
Any comments considered necessary by applicant must be submitted no laterthan the payment of the issue fee and, to avoid processing delays, should preferablyaccompany the issue fee. Such submission should be clearly labeled "Comments onStatement of Reasons for Allowance." 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Mahendra Patel whose telephone number is (571)270-7499. The examiner can normally be reached on 9:30 AM to 5:30 PM (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Anthony Addy can be reached on (571) 272-779. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300


/MAHENDRA R PATEL/Primary Examiner, Art Unit 2645