DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
	The information disclosure statement, filed 11 June 2020, complies with the provisions of 37 CFR 1.97, 1.98. It has been placed in the application file, and the information referred to therein has been considered as to the merits1.  An initialed and dated copies of Applicant’s IDS form 1449- Paper No 20200611, is attached to the instant Office action.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: a depth module and a transformation module in claims 1-13.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-14 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Schulter et al. (US 2019/0096125 A1) in view of Qi et al. (US 2019/0147245 A1).
a.	Regarding claim 1, Schulter discloses a system for producing a bird's eye view image from a two dimensional image, the system comprising: 
one or more processors (Schulter discloses that “a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus” at ¶ 0026) (emphasis added); and 
a memory communicably coupled to the one or more processors and storing (Schulter discloses that “A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus at ¶ 0026) (emphasis added): 
a depth module including instructions that when executed by the one or more processors cause the one or more processors to receive the two dimensional image and to execute a first neural network to produce a depth map (Schulter discloses “inferring depths of the occluded objects by predicting depths in masked areas of the masked image with a depth in-painting network according to the contextual information . . . [and] mapping the foreground objects and the background objects to a three-dimensional space with a background mapping system according to locations of each of the foreground objects, the background objects and occluded object using the inferred depths” at Figs. 9-904 and 905 and ¶¶ 0109-110. Schulter also discloses a depth predictor 404 included with the computer processing device 110 determines depth measurements for each foreground object. To determine the depth measurements, the depth predictor 404 can establish a depth map according to, e.g., a stereoscopic image, a neural network for predicting depths such as, e.g., a fully convolutional residual network, or other depth determination technique. The depth map can be applied to the foreground objects extracted by the object detector 402 to determine 3D dimensional coordinates for each foreground object” at Fig. 7-404 and ¶ 0032); and 
a transformation module including instructions that when executed by the one or more processors cause the one or more processors to receive the depth map and to execute a second neural network to produce the bird's eye view image (Schulter discloses “generating a bird's eye view from the three-dimensional space” at Fig. 9-906 and ¶ 0111. Schulter also discloses that “[t]he refinement network 310 can refine the initial bird's eye view generated by the view converter 304 using a trained refinement network including, e.g., a CNN trained to correct imperfects in the background objects of the initial bird's eye view. In one possible embodiment, the refinement network 310 includes, e.g., a CNN with an encoder-decoder structure and a fully-connected bottleneck layer” at Fig. 5-310 and ¶ 0073), 
wherein the second neural network implements a machine learning algorithm (Schulter discloses that “the in-painting network 806 establishes both features and depth values, similar to the depth predictor 804, a mapping system 808 can establish coordinates for each background object to localize the background objects in 3D space, such as, e.g., by generating a 3D point cloud. The 3D point cloud can be converted to a bird's eye view by eliminating an elevation component form the 3D point cloud, projecting the points onto a horizontal plane. Thus, a 2D, top-down map of the background objects is created” at Fig. 8-808 and ¶ 0102) that: 
preserves spatial gradient information associated with one or more objects included in the depth map (Schulter discloses that “the in-painting network 806 establishes both features and depth values, similar to the depth predictor 804, a mapping system 808 can establish coordinates for each background object to localize the background objects in 3D space, such as, e.g., by generating a 3D point cloud. The 3D point cloud can be converted to a bird's eye view by eliminating an elevation component form the 3D point cloud, projecting the points onto a horizontal plane. Thus, a 2D, top-down map of the background objects is created” at Fig. 8-809 and ¶ 0102). 
However, Schulter does not discloses causes a position of a pixel in an object, included in the bird's eye view image, to be represented by a differentiable function.  
Qi discloses causes a position of a pixel in an object, included in the bird's eye view image, to be represented by a differentiable function (Qi discloses that “extracting a three-dimensional frustum from the three-dimensional depth data using the attention region comprises aligning the frustums captured from a plurality of perspectives by rotating the depth data in each frustum by a yaw angle associated with each perspective. In some embodiments, the yaw angle is calculated using the position of the center pixel in the attention region. In some embodiments, aligning the frustums reduces the distribution space of the frustum point cloud, which improves segmentation performance” at ¶ 0063).
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the process of calculating the center pixel of Qi to Schulter’s birds eye view generating process.
The suggestion/motivation would have been to “[allow] for the parameterized and tailoring of the learning engines by relying on naturally occurring geometric features, such as repetition, planarity, and symmetry” (Qi; ¶ 0005).
b.	Regarding claim 2, the combination applied in claim 1 discloses wherein a combination of the first neural network and the second neural network is end-to-end trainable (Schulter discloses a depth predictor and a refine network, which are trainable CNN at ¶¶ 0032, 0073).
c.	Regarding claim 3, the combination applied in claim 1 discloses wherein the combination is end-to-end trainable without using a term that is a substitution for photometric loss (Schulter discloses a depth predictor and a refine network without using any type of losses CNN at ¶¶ 0032, 0073).
  d.	Regarding claim 4, the combination applied in claim 1 discloses wherein the combination is end-to-end trainable without depth supervision (Schulter discloses a depth predictor, which does not use labeled data for processing at ¶ 0032).
  e.	Regarding claim 5, the combination applied in claim 1 discloses further comprising a data store communicably coupled to the one or more processors and storing the two dimensional image (Schulter discloses that “A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus at ¶ 0026).  
f.	Regarding claim 6, the combination applied in claim 1 discloses wherein the data store is further configured to store the depth map (Schulter discloses that “A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus at ¶ 0026).  
g.	Regarding claim 7, the combination applied in claim 1 discloses wherein the data store is further configured to store the bird's eye view image Schulter discloses that “A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus at ¶ 0026).  
h.	Regarding claim 8, the combination applied in claim 1 discloses wherein the system is a subsystem of a system for three dimensional object detection (Schulter discloses “an object detection and localization network” at Fig. 7 and ¶ 0039).
i.	Regarding claim 9, the combination applied in claim 1 discloses wherein the memory further stores a three dimensional object detection module that when executed by the one or more processors cause the one or more processors to: 
receive the two dimensional image (Schulter discloses “inferring depths of the occluded objects by predicting depths in masked areas of the masked image with a depth in-painting network according to the contextual information . . . [and] mapping the foreground objects and the background objects to a three-dimensional space with a background mapping system according to locations of each of the foreground objects, the background objects and occluded object using the inferred depths” at Figs. 9-904 and 905 and ¶¶ 0109-110, the bird's eye view image (Schulter discloses “generating a bird's eye view from the three-dimensional space” at Fig. 9-906 and ¶ 0111. Schulter also discloses that “[t]he refinement network 310 can refine the initial bird's eye view generated by the view converter 304 using a trained refinement network including, e.g., a CNN trained to correct imperfects in the background objects of the initial bird's eye view. In one possible embodiment, the refinement network 310 includes), and the spatial gradient information (Schulter discloses that “the in-painting network 806 establishes both features and depth values, similar to the depth predictor 804, a mapping system 808 can establish coordinates for each background object to localize the background objects in 3D space, such as, e.g., by generating a 3D point cloud. The 3D point cloud can be converted to a bird's eye view by eliminating an elevation component form the 3D point cloud, projecting the points onto a horizontal plane. Thus, a 2D, top-down map of the background objects is created” at Fig. 8-809 and ¶ 0102); and 
execute a third neural network to detect one or more three dimensional objects, wherein the system for three dimensional object detection includes the first neural network, the second neural network, and the third neural network (Schulter discloses “a depth predictor 404 included with the computer processing device 110 determines depth measurements for each foreground object. To determine the depth measurements, the depth predictor 404 can establish a depth map according to, e.g., a stereoscopic image, a neural network for predicting depths such as, e.g., a fully convolutional residual network, or other depth determination technique. The depth map can be applied to the foreground objects extracted by the object detector 402 to determine 3D dimensional coordinates for each foreground object” at Fig. 7-404 and ¶ 0032; Schulter also discloses that “[t]he refinement network 310 can refine the initial bird's eye view generated by the view converter 304 using a trained refinement network including, e.g., a CNN trained to correct imperfects in the background objects of the initial bird's eye view. In one possible embodiment, the refinement network 310 includes, e.g., a CNN with an encoder-decoder structure and a fully-connected bottleneck layer” at Fig. 5-310 and ¶ 0073; Schulter discloses that “the in-painting network 806 establishes both features and depth values, similar to the depth predictor 804, a mapping system 808 can establish coordinates for each background object to localize the background objects in 3D space, such as, e.g., by generating a 3D point cloud” at Fig. 8-808 and ¶ 0102).
j.	regarding claim 10, the combination applied in claim 1 discloses further comprising a data store communicably coupled to the one or more processors and configured to store one or more representations for the one or more three dimensional objects (Schulter discloses that “A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus at ¶ 0026).  
 k.	Regarding claim 11, the combination applied in claim 1 discloses wherein a combination of the first neural network, the second neural network, and the third neural network is end-to-end trainable (Schulter discloses a depth predictor, a refine network, and in-paining network, which are trainable CNN at ¶¶ 0032, 0073, 0102).
l.	Regarding claim 12, the combination applied in claim 1 discloses wherein the system for three dimensional object detection is a subsystem of a perception system (Schulter discloses “an object detection and localization network” at Fig. 7 and ¶ 0039).
m.	Regarding claim 13, the combination applied in claim 1 discloses wherein the perception system is a subsystem of an autonomous driving system (Schulter discloses an “autonomous vehicle 830 with a device to capturing images with a perspective view of a complex environment, such as, e.g., a complex road scene” at Fig. 8-830 and ¶ 0097). 
	n.	Regarding claim 14, claim 14 is analogous and corresponds to claim 1. See rejection of claim 1 for further explanation.
o.	Regarding claim 16, the combination applied in claim 1 discloses wherein the two dimensional image comprises a monocular image (Qi discloses a monocular RGB images at ¶ 0121).  
p.	Regarding claim 17, the combination applied in claim 1 discloses wherein the two dimensional image comprises a pair of images, the pair of images comprising a stereo image (Schulter discloses stereoscopic image at ¶ 0032).
q.	Regarding claim 18, claim 18 is analogous and corresponds to claim 1. See rejection claim 1 for further explanation.
r.	Regarding claim 19, claim 19 is analogous and corresponds to claim 9. See rejection of claim 9 for further explanation.
	s.	Regarding claim 20, claim 20 is analogous and corresponds to claim 1. See rejection of claim 1 for further explanation.

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Schulter et al. (US 2019/0096125 A1) in view of Qi et al. (US 2019/0147245 A1), and further in view Chen et al. (US 2019/0304102 A1).
	a.	Regarding claim 15, the combination applied in claim 14 discloses all the previous claim limitations. However, the combination does not explicitly disclose wherein the differentiable function is a Gaussian function and the position of the pixel corresponds to a position of a mean value associated with the Gaussian function.  
	Chen discloses wherein the differentiable function is a Gaussian function and the position of the pixel corresponds to a position of a mean value associated with the Gaussian function (Chen discloses that “the background subtraction engine 312 can use a Gaussian distribution model for each pixel location, with parameters of mean and variance to model each pixel location in frames of a video sequence. All the values of previous pixels at a particular pixel location are used to calculate the mean and variance of the target Gaussian model for the pixel location. When a pixel at a given location in a new video frame is processed, its value will be evaluated by the current Gaussian distribution of this pixel location” at Fig. 3-312 and ¶¶ 0078-0082).
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the background subtraction engine of Chen to the combination.
The suggestion/motivation would have been to “provide efficient and robust [sic] sequence processing” (Chen; ¶ 0004).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN W LEE whose telephone number is (571)272-9554. The examiner can normally be reached Mon-Fri 8:00AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, NAY MAUNG can be reached on 571-272-7882. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOHN W LEE/Primary Examiner, Art Unit 2664                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 See MPEP § 609