DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 19 is objected to because of the following informalities: claim 19 relies on “The apparatus of claim 17, wherein the generating of the respective masks comprises generating the respective masks based on a baseline of a camera”, however, masks are not referred to until claim 18. For the purposes of examination, this is interpreted as a typo, so dependency is interpreted as depending on claim 18. Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 10-12, 14-17, and 21-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehta et al. (“Structured Adversarial Training for Unsupervised Monocular Depth Estimation”, 2018).

Regarding claims 1 and 15, Mehta et al. disclose a method with algorithm updating, comprising and apparatus with algorithm updating, comprising: a processor configured to (processor, part 4.1): receiving a first input batch comprising one or more first images (training pairs Li, Ri for the incoming right and left views, part 3); generating a first output batch with respect to the first input batch using an algorithm configured to generate a disparity image, the first output batch comprising one or more first output images (Di is the depth for the point i with disparity di, B is the baseline for the stereo camera and f is the focal length, with a known dense disparity map, a stereo pair can be generated from the image corresponding to the disparity map, generator takes Li as the input and generates a pair of disparities (DLi ; DRi ), part 3); receiving a second input batch corresponding to the first input batch, the second input batch comprising one or more second images having viewpoints that are different from viewpoints of the one or more first images (generating right view Ri’ from left view Li, generator GɵG learns to synthesize the adjacent right view for an input image using the generated disparity DRi, part 3.1); generating a test batch based on the first output batch and the second input batch, the test batch comprising one or more test images (a bilinear sampler (S) [16] generates the stereo pair as Li’ = S(DLi ;Ri) and Ri’ = S(DRi ;Li), (Li;Ri) pairs are present only during training, part 3, 
    PNG
    media_image1.png
    75
    370
    media_image1.png
    Greyscale
, equation 3, see specifically 
    PNG
    media_image2.png
    36
    97
    media_image2.png
    Greyscale
, part 3.1) [terms “pairs are present only during training” and 
    PNG
    media_image3.png
    91
    436
    media_image3.png
    Greyscale



    PNG
    media_image4.png
    42
    165
    media_image4.png
    Greyscale
 where L is the loss function using l 1 loss, Ri’ is a synthesized right view created by the bilinear sampler Ri is input right view, Eq. 6, Eq. 7, part 3.4, train the generator using adversarial loss and photometric reconstruction loss) [training the generator will necessarily involve “updating the algorithm”].

Mehta et al. do not use the language “test batch”. It would have been obvious at the time of filing to one of ordinary skill in the art that as Mehta et al. specifies “pairs are present only during training” and also the concept of “p train ( R )”, this can be interpreted as a test batch, and thereby equations 6 and 7 incorporate this test batch in the final difference calculations.

Regarding claims 2 and 16, Mehta et al. disclose the method and apparatus of claims 1 and 15. Mehta et al. further disclose wherein the one or more first images comprise at least one left image captured by a left camera of stereo cameras and at least one right image captured by a right camera of the stereo cameras (Point-to-point correspondence between the left view and the right view of a stereo pair, generate the adjacent stereo-view from a single image, We use a one-step approach instead, in which a dense disparity map is generated from a single image, 

Regarding claim 3, Mehta et al. disclose the method of claim 1. Mehta et al. further disclose
the one or more first images do not have a label indicating a viewpoint (unsupervised methods, abstract, Unsupervised Depth Estimation, Capturing ground-truth depth maps is expensive and time consuming. An alternate strategy to train CNNs for depth estimation is to use multiview
supervision as a proxy for depth supervision, Our method is unsupervised as it requires
only stereo images for training and it even performs better than most depth-supervised techniques. part 2) [unsupervised means data is unlabeled, while the paper does not specify it is the viewpoint that is unlabeled, it would be obvious that the unlabeled data would include not labeling the viewpoints for the monocular input images]

Regarding claim 4, Mehta et al. disclose the method of claim 1. Mehta et al. further disclose 
the one or more first images and the one or more second images are stereo images (Point-to-point correspondence between the left view and the right view of a stereo pair, generate the adjacent stereo-view from a single image, We use a one-step approach instead, in which a dense disparity map is generated from a single image, which is in-turn used to create the adjacent stereo view, part 3; The left-disparity is taken as the left view of a stereo pair and is warped into its corresponding right view, part 3.4).

Regarding claim 5, Mehta et al. disclose the method of claim 1. Mehta et al. further disclose 
the algorithm is a neural network-based algorithm (train a generator network as a feed-forward CNN, part 3).

Regarding claims 6 and 17, Mehta et al. disclose the method and apparatus of claims 1 and 15. Mehta et al. further disclose the updating of the algorithm comprises: determining the difference between the first input batch and the test batch; and updating the algorithm to reduce the difference between the first input batch and the test batch (The adversarial objective influenced by the parameters ɵG of the generator is minimized to train the generator, part 3.3, We use photometric losses to incentivize the generator to create images which are pixel-wise similar to right views corresponding to input left views, part 3.4).

Regarding claim 10, Mehta et al. disclose the method of claim 1. Mehta et al. non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 (Our models are implemented using the Tensorflow [1] deep learning framework. Training is performed on two NVIDIA 1080 Ti GPUs, part 4.1, memory efficiency, part 4.2) [while a non-transitory computer-readable storage medium is not specifically cited, it would be obvious a computer composed of such a memory is thus implied].

Regarding claims 11 and 22, Mehta et al. disclose a method with algorithm updating, comprising and apparatus with algorithm updating, comprising: a processor configured to (processor, part 4.1): receiving a first input batch comprising one or more first images (training pairs Li, Ri for the incoming right and left views, part 3); generating a first output batch with respect to the first input batch using an algorithm for generating a disparity image, the first output batch comprising one or more first output images (Di is the depth for the point i with disparity di, B is the baseline for the stereo camera and f is the focal length, with a known 
    PNG
    media_image1.png
    75
    370
    media_image1.png
    Greyscale
, equation 3, see specifically 
    PNG
    media_image2.png
    36
    97
    media_image2.png
    Greyscale
, part 3.1) [terms “pairs are present only during training” and “p_train” interpreted as interpretations for a “test batch”); receiving a second input batch corresponding to the first input batch, the second input batch comprising one or more second images having viewpoints that are different from viewpoints of the one or more first images (generating right view Ri’ from left view Li, part 3.1); and updating the algorithm based on a difference between the second input batch and the test batch (
    PNG
    media_image5.png
    219
    469
    media_image5.png
    Greyscale
 , equation 8, part 3.4)
[training the generator will necessarily involve “updating the algorithm”].

train ( R )”, this can be interpreted as a test batch, and thereby equation 8 incorporates this test batch in the final difference calculations.

Regarding claim 12, Mehta et al. disclose the method of claim 11. Mehta et al. further disclose determining the difference between the second input batch and the test batch; and updating the algorithm to reduce the difference between the second input batch and the test batch (The adversarial objective influenced by the parameters ɵG of the generator is minimized to train the generator, part 3.3, We use photometric losses to incentivize the generator to create images which are pixel-wise similar to right views corresponding to input left views, part 3.4).

Regarding claim 14, Mehta et al. disclose the method of claim 11. Mehta et al. further disclose a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 11 (Our models are implemented using the Tensorflow [1] deep learning framework. Training is performed on two NVIDIA 1080 Ti GPUs, part 4.1, memory efficiency, part 4.2) [while a non-transitory computer-readable storage medium is not specifically cited, it would be obvious a computer composed of such a memory is thus implied].

Regarding claim 21, Mehta et al. disclose the apparatus of claim 15. Mehta et al. further disclose a memory storing a program, wherein the processor is configured to execute the program to receive the first input batch, generate the first output batch, receive the second 

Regarding claim 23, Mehta et al. disclose the apparatus of claim 22. Mehta et al. further disclose stereo cameras including a left camera and a right camera, wherein the one or more first images comprise at least one left image generated by the left camera and at least one right image generated by the right camera (Point-to-point correspondence between the left view and the right view of a stereo pair, generate the adjacent stereo-view from a single image, We use a one-step approach instead, in which a dense disparity map is generated from a single image, which is in-turn used to create the adjacent stereo view, part 3; The left-disparity is taken as the left view of a stereo pair and is warped into its corresponding right view, part 3.4).

Claims 7, 8, 13, 18 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehta et al. (“Structured Adversarial Training for Unsupervised Monocular Depth Estimation”, 2018) as applied to claims 6, 12 and 15 above, further in view of Wang et al. (US 20190301861 A1).

Regarding claims 7 and 13 and 18, Mehta et al. disclose the method and system of claims 6 and 12 and 15. Mehta et al. do not disclose the determining of the difference between the first input batch and the test batch comprises: generating respective masks on a first image, among 

Wang et al. teach the determining of the difference between the first input batch and the test batch comprises: generating respective masks on a first image, among the one or more first images, and a test image, among the one or more test images; and determining the difference between the first input batch and the test batch based on regions, excluding the masks, of the first image and the test image (“After the confidence volume is obtained, on one hand, an argmax value for the confidence levels of all disparity values in the disparity dimension for each pixel point in the confidence volume is calculated as an output, such that a complete, dense disparity map may be obtained. However, this disparity map contains many pixel points having low confidence levels or matching errors. On the other hand, a confidence map is obtained by selecting a maximum value from confidence levels of all the disparity values in the disparity dimension for each pixel point in the confidence volume, and the confidence map is converted into “0”s and “1”s to obtain a mask map. Finally, a target disparity map is obtained by multiplying the mask map with the disparity map, such that those pixel points having low confidence levels or matching errors in the disparity map may be filtered out and only the pixel points having high confidence levels will be maintained. A distance may be estimated more accurately based on the target disparity map”, [0018]).

Mehta et al. and Wang et al. are in the same art of disparity maps (Mehta et al., part 3, part 3.1; Wang et al., [0018]). The combination of Wang et al. with Mehta et al. enables use of a mask. It would have been obvious at the time of filing to one of ordinary skill in the art to 

Regarding claims 8 and 19, Mehta et al. and Wang et al. disclose the method and system of claims 7 and 18. Mehta et al. and Wang et al. further indicate the generating of the respective masks comprises generating the respective masks based on a baseline of a camera (Mehta et al., equation based on camera baseline: “For stereo synthesis, the baseline is assumed to be fixed throughout the training set, most frameworks implicitly use camera-pose supervision”, part2, 
    PNG
    media_image6.png
    178
    456
    media_image6.png
    Greyscale
, part 3, for generating views from 2.5D with severe changes in camera pose, generating right views with fractional baselines, part 3.2; Wang et al., mask generation, [0015], [0017]) [together these teach the claim, as Mehta et al. use the baseline to create the depth map using disparity, and Wang et al. use a mask to create a disparity map] 

Claims 9 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehta et al. (“Structured Adversarial Training for Unsupervised Monocular Depth Estimation”, 2018) and Wang et al. (US 20190301861 A1) as applied to claims 7 and 17 above, further in view of Lopez et al. (US 20170085863 A1).

Regarding claims 9 and 20, Mehta et al. and Wang et al. disclose the method and system of claims 7 and 17. Mehta et al. and Wang et al. do not disclose generating of the respective masks comprises generating the respective masks based on either one or both of object information in the first image and object information in the test image.

Lopez et al. teach generating of the respective masks comprises generating the respective masks based on either one or both of object information in the first image and object information in the test image (One or more embodiments may use machine learning to generate an object depth model. For example, a 3D conversion dataset associated with a conversion example in the training set may include an object depth model input, and an object depth model output. The object depth model input may include, for example, an object mask, [0029], obtaining an external depth map associated with a two-dimensional image at 201, obtaining at least one mask associated with at least one area within the two-dimensional image at 202, calculating a fit or best fit for a plane using a computer based on depth associated with the at least one area associated with each of the at least one mask at 203, optionally, embodiments of the method may also automatically alter the position, orientation, shape, depth or curve of planes or masks to fit the edges of the planes or masks with other planes or masks for example at 204, applying depth associated with the plane having the fit to the at least one area to shift pixels in the two-dimensional image horizontally to produce a stereoscopic image or stereoscopic image pair, [0063], error-free depth applied to planes and/or masks of areas or regions associated with the two-dimensional input image, [0076], A training set 2201 of 2D to 3D conversion examples is used to train machine learning system, [0099]).

.

Claim 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehta et al. (“Structured Adversarial Training for Unsupervised Monocular Depth Estimation”, 2018) as applied to claim 22 above, further in view of Csordás et al. (US 10380753 B1).

Regarding claim 24, Mehta et al. disclose the apparatus of claim 22. Mehta et al. further disclose the algorithm comprises a neural network-based algorithm (feed-forward CNN, part 3). 

Csordás et al. teach the updating of the algorithm comprises updating the algorithm through backpropagation (generating a displacement map of a first input dataset and a second input dataset of an input dataset pair (e.g. a disparity map of a stereo image pair), col. 2, lines 10-20, “displacement map” shall refer to a certain n-dimensional generalized disparity, i.e. a generalization of the well-known stereo (left-right) disparity, col. 2, lines 25-40, “The loss function is typically backpropagated in the apparatus to fine tune the machine learning (neural network) components, but any other training algorithm can be used in place of backpropagation. In a supervised approach of displacement (e.g. disparity) generation, the resultant displacement map (and the calculated depth values) may be compared to e.g. LIDAR data; there is no need for such data in self-supervised or unsupervised approaches, col. 7, lines 30-40, generating a displacement map of a first input dataset and a second input dataset of an input dataset pair (in the example of FIG. 2, the displacement map is a disparity map of a left image 20a and a right image 30a of a stereo image pair, col. 8, lines 20-40).

Mehta et al. and Csordás et al. are in the same art of disparity maps (Mehta et al., part 3, part 3.1; Csordás et al., col. 8, lines 20-40). The combination of Csordás et al. with Mehta et al. enables use of backpropagation. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the backpropagation of Csordás et al. with the invention of Mehta et al. as this was known at the time of filing, the combination would have predictable results, and as Csordás et al. indicate “In view of the known approaches, there is a demand for a method and an apparatus for generating a displacement map of a first input dataset and a .

Claim 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mehta et al. (“Structured Adversarial Training for Unsupervised Monocular Depth Estimation”, 2018) as applied to claim 22 above, further in view of Chang et al. (US 20160150210 A1).

Regarding claim 25, Mehta et al. disclose the apparatus of claim 22. Mehta et al. further partly disclose the updating of the algorithm based on the difference between the second input batch and the test batch comprises updating the algorithm based on an average of respective differences between second images, among the one or more second images, and corresponding test images, among the two or more test images (Previous methods for depth estimation based on viewsynthesis [11, 54] rely on primitive reconstruction loss functions like mean-squared error, part 3), but another reference is added to make this more explicit.

Chang et al. teach updating of the algorithm based on the difference between the second input batch and the test batch comprises updating the algorithm based on an average of respective differences between second images, among the one or more second images, and corresponding 
    PNG
    media_image7.png
    539
    410
    media_image7.png
    Greyscale
 ([0139]-[0143]).

Mehta et al. and Chang et al. are in the same art of disparity maps (Mehta et al., part 3, part 3.1; Chang et al., [0066]). The combination of Chang et al. with Mehta et al. enables use of the 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VINCENT M RUDOLPH can be reached on (571)272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center 





/MICHELLE M ENTEZARI/Primary Examiner, Art Unit 2661