DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
2.	The amendments to independent claims 1, 18, and 19 overcomes the previous 35
USC 103 Rejection. However, the amendments necessitate a new ground of
rejections, see 35 USC 103 Rejection below.

Response to Arguments
3.	Applicant’s arguments with respect to claim(s) 1, 18, and 19 have been considered
but are moot because the new ground of rejection does not rely on any reference applied
in the prior rejection of record for any teaching or matter specifically challenged in the
argument.

Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	Claim 1-11, and 14, 16, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Graham, ("Sparse 3D convolutional neural networks." arXiv preprint arXiv:1505.02890.), in view of Urtasun (US 20190146497).

Regarding claim 1, Graham teaches:
A method comprising, by one or more computing systems: accessing a plurality of content objects; (2.2 Object recognition; we used a dataset of 3D objects2
, each stored as a mesh of triangles in the OFF-file format. The dataset contains 1200 exemplars split evenly between 50 classes (aliens, ants, armadillo ...).);
	generating a plurality of voxelized representations for the plurality of content objects, respectively (Figure 4; Items from the 3D object dataset used in Section 2.2, embedded into a 40 × 40 × 40 cubic grid. Top row: four items from the snake class. Bottom row: an ant, an elephant, a robot and a tortoise.);
	determining, based on the voxelized representation of each of the plurality of content objects, one or more active sites for each of the plurality of content objects (1.3 Sparse operations; Calculating a 2×2 convolution for a sparse CNN: On the left is a 6×6 square grid with 3 active sites. The convolutional filter needs to be calculated at each location that covers at least on active site; this corresponds to the shaded region. The figure on the right marks the location of the eight active sites in the 5 × 5 output layer. Page 5); and
applying, to the one or more active sites, the one or more sparse convolutions (1.3 Sparse operations, Figure 3; Calculating a 2×2 convolution for a sparse CNN: On the left is a 6×6 square grid with 3 active sites.).
But Graham does not explicitly teach generating, based on one or more sparse convolutions that operate on active sites of content objects based on one or more of a filter or a stride, one or more building blocks and training a machine-learning model based on a convolutional network, wherein the convolutional network comprises the one or more building blocks; and training a machine-learning model based on a convolutional network, wherein the convolutional network comprises the one or more building blocks.
	However, Urtasun teaches:
	generating, based on one or more sparse convolutions that operate on active sites of content objects based on one or more of a filter or a stride, one or more building blocks (Paragraph 0167; The sparse block convolutions proposed herein also integrate well with residual units. In some implementations, a single residual unit contains three convolution, batch norm, and ReLU layers, all of which can be operated under sparse mode. The total increase in receptive field of a residual unit can be the same as a single 3×3 convolution. Therefore, in some implementations, all nine layers can share a single gathering and scattering operation without growing the overlap area between blocks. In addition to the computation savings, batch-normalizing across non-sparse elements contributes to better performance since it ignores non-valid data that may introduce noises to the statistics. As one example, FIG. 8 provides graphical diagrams of an example regular residual unit and an example sparse residual unit according to example embodiments of the present disclosure. Paragraph 0161; In some implementations, sparse gathering and scattering operations can convert the network between dense and sparse modes. In some implementations, unlike kernels that are implemented in deep learning libraries (e.g., tf.gather_nd,tf.scatter_nd), the proposed kernel not only operates on dense indices but also expands spatially to its neighborhood window. Paragraph 0163; Example scatter kernel: Scatter can perform the reverse operation of gather, reusing the same input mask and block index list. The input to scatter kernel can be a tensor of shape [B×h′×w′×C]. Examiner note: non-sparse elements are the active sites. In paragraph 0007 filters can also be referred to as kernels); and 
training a machine-learning model based on a convolutional network, wherein the convolutional network comprises the one or more building blocks (Paragraph 0167; In some implementations, a neural network can include a plurality of sparse residual units (e.g., stacked one after the other)).
Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and the method of Urtasun in order to significantly reduce the computational complexity of standard convolutional and dense layers in deep neural networks. (Paragraph 0151).

Regarding claim 2, modified Graham teaches the method of Claim 1, and Urtasun also teaches:
	wherein each of the plurality of content objects comprises a three- dimensional (3D) point cloud comprising a plurality of points. (Paragraph 0037; the input imagery is sparse in nature. As one example, the input imagery can include LIDAR imagery produced by a LIDAR system. For example, the LIDAR imagery can be a three-dimensional point cloud, where the point cloud is highly sparse.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and Urtasun because the point cloud can describe the locations of detected objects in three-dimensional space and, for many (most) locations in three-dimensional space, there was not an object detected at such location. (Paragraph 0037).

Regarding claim 3, modified Graham teaches the method of Claim 2, and Urtasun also teaches:
	wherein generating the voxelized representation for each content object comprises: determining, for the 3D point cloud, one or more voxels, wherein each voxel comprises one or more points. (Paragraph 0045; the input imagery can be a three-dimensional point cloud of LIDAR data. To generate the binary mask, the three-dimensional space can be divided into a plurality of voxels.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and Urtasun because the computing system can determine the amount of data (e.g., the number of LIDAR data points) included in each voxel and can classify each voxel as either sparse or non-sparse based on the amount of data included in such voxel (Paragraph 0045).

Regarding claim 4, modified Graham teaches the method of Claim 2, and Urtasun also teaches:
	wherein the content object comprises one or more parts, and wherein one or more points of the plurality of points are associated with a part label corresponding to one of the one or more parts (Paragraph 0025; The computing system can provide each of the one or more relevant portions of the imagery to a machine-learned convolutional neural network and receive at least one prediction from the machine-learned convolutional neural network based at least in part on the one or more relevant portions of the imagery. Paragraph 0126; the point cloud can describe the locations of detected objects in three-dimensional space and, for many (most) locations in three-dimensional space).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and Urtasun in order to detect object(s) in a surrounding environment of the autonomous vehicle, as depicted by the imagery. (Paragraph 0110).

Regarding claim 5, modified Graham teaches the method of Claim 1, and Graham also teaches:
	wherein the convolutional network is based on a three-dimensional architecture (2 Experiments ; We have performed experiments to test triangular and sparse 3D CNNs.).
	
Regarding claim 6, modified Graham teaches the method of Claim 1, and Graham also teaches:
	wherein each of the one or more sparse convolutions correlates the one or more active sites with one or more output based on one or more filters and one or more strides (1.1 Adding a dimension to 2D; To apply a 7×7×7 convolutional filter with the same stride. 1.2 CNNs on different lattices; The n counts the number of convolutional filters… on the square, cubic, triangular and tetrahedral lattices, respectively—and s denotes the stride.).

Regarding claim 7, modified Graham teaches the method of Claim 1, and Graham also teaches:
	wherein the convolutional network comprises a plurality of layers, each layer comprising a plurality of network blocks (2.1 Square versus triangular 2D convolutions;  Both networks have 12 small convolutional layers split into pairs by 5 layers of max-pooling, and with the n-th pair of convolutional filters each having 32n output features. Note: Network blocks could describe a single layer, a component consisting of multiple layers, or the entire model itself.)

Regarding claim 9, modified Graham teaches the method of Claim 1, and Urtasun also teaches:
	wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more activation functions to the one or more active sites (Paragraph 0125; FIGS. 3A-C provide example illustrations of various stages of an example processing pipeline. FIG. 3A depicts a graphical diagram of example LIDAR imagery 300 according to example embodiments of the present disclosure. The imagery 300 is primarily sparse but does include a number of pixels 302a-d that have are non-sparse. Paragraph 0167; The sparse block convolutions proposed herein also integrate well with residual units. In some implementations, a single residual unit contains three convolution, batch norm, and ReLU layers, all of which can be operated under sparse mode.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and the method of Urtasun in order to provide a sparse update to hidden features. (Paragraph 0072).

Regarding claim 10, modified Graham teaches the method of Claim 1, and Urtasun also teaches:
	wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more batch normalizations to the one or more active sites (Paragraph 0125; FIGS. 3A-C provide example illustrations of various stages of an example processing pipeline. FIG. 3A depicts a graphical diagram of example LIDAR imagery 300 according to example embodiments of the present disclosure. The imagery 300 is primarily sparse but does include a number of pixels 302a-d that have are non-sparse. Paragraph 0167; The sparse block convolutions proposed herein also integrate well with residual units. In some implementations, a single residual unit contains three convolution, batch norm, and ReLU layers, all of which can be operated under sparse mode.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and the method of Urtasun because batch-normalizing across non-sparse elements contributes to better performance since it ignores non-valid data that may introduce noises to the statistics. (Paragraph 0167).

Regarding claim 11, modified Graham teaches the method of Claim 1, and Graham also teaches:
	wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more downsampling operations to the one or more active sites, wherein each downsampling operation comprises one or more of pooling or strided convolution, and wherein each pooling comprises one or more of max pooling or average pooling (2.2 Object recognition; We also tried a stochastic form of max-pooling on the cubic lattice which we denote FMP;we used FMP to downsample the hidden layer
by a factor of 22/3 ≈ 1.59; this allows us to gently increase the number of learnt layers for a given input scale. See Figure 5).

Regarding claim 14, modified Graham teaches the method of Claim 1, and Graham also teaches:
	wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more linear operations to the one or more active sites (1.3 Sparse operations; this can both dramatically reduce memory requirements and speed up operations such as matrix multiplication. Calculate Mout = Q × W + B).

Regarding claim 16, modified Graham teaches the method of Claim 1, and Graham also teaches:
	further comprising: generating one or more hash tables and one or more rule books, wherein the one or more hash tables comprise location information associated with a plurality of active sites of the plurality of content objects, and wherein the one or more rule books comprise a plurality of input-out pairs associated with the plurality of active sites, the input-output pairs being determined based on the one or more sparse convolutions (1.3 Sparse operations;  A matrix Min with size ain × nin. Each row corresponds to the vector at one of the active spatial locations. A map or hash table Hin of (key,value) pairs. The keys are the active spatial locations. The values record the number of the corresponding row in Min. Use Hin, Min and gin to build a matrix Q of size aout × (f2nin); each row of Q should correspond to the inputs visible to the convolutional filter at the corresponding output spatial location. Q identifies input-output pairs (inputs visible…at corresponding output spatial location.)

Regarding Claim 18, it is a computer-readable non-transitory storage media claim that corresponds to the method of Claim 1, it is substantially similar to Claim 1 and is rejected in the same manner, the same art and reasoning applying. Further, Urtasun also teaches one or more computer-readable non-transitory storage media embodying software that is operable when executed to (Paragraph 0081; The memory 114 can include one or more non-transitory computer-readable storage mediums, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof. The memory 114 can store data 116 and instructions 118 which are executed by the processor 112 to cause autonomy computing system 102 to perform operations.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and Urtasun in order to store data 116 and instructions 118 which are executed by the processor 112 to cause autonomy computing system 102 to perform operations (Paragraph 0081;).
	
Regarding Claim 19, it is a system claim that corresponds to the method of Claim 1, it is substantially similar to Claim 1 and is rejected in the same manner, the same art and reasoning applying. Further, Urtasun also teaches A system comprising: one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to: access a plurality of content objects (Paragraph 0081; The autonomy computing system 102 includes one or more processors 112 and a memory 114. The one or more processors 112 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the system of Graham and the system Urtasun in order to cause autonomy computing system 102 to perform operations (Paragraph 0081).
	
7.	Claim 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Graham, ("Sparse 3D convolutional neural networks." arXiv preprint arXiv:1505.02890.), in view of Urtasun (US 20190146497), in further view of Li, et al. ("Convolutional neural network-based block up-sampling for intra frame coding." IEEE Transactions on Circuits and Systems for Video Technology 28.9).

Regarding claim 12, modified Graham teaches the method of Claim 1 and the one or more active site (1.3 Sparse operations; determine the number aout of active spatial locations in the output layer.), but modified Graham does not explicitly teach wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more deconvolution operations to the one or more active sites.
However, Li teaches:
wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more deconvolution operations to the one or more active sites (IV. CNN-BASED UP-SAMPLING, A. CNN for Luma Up-Sampling, 2) Deconvolution, p.2320; the deconvolution layer is used to enlarge the multiscale feature maps and the enlarged features are then used to reconstruct HR image. “As shown in Fig. 2, the third layer performs deconvolution of the multi-scale feature maps extracted by the second layer.”)
Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of modified Graham and the method Li in order to enlarge the multiscale feature maps and the enlarged features are then used to reconstruct HR image (IV. CNN-BASED UP-SAMPLING , A. CNN for Luma Up-Sampling, 2) Deconvolution, p.2320).

Regarding claim 13, modified Graham teaches the method of Claim 1, and Li also teaches:
wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more upsampling operations to the one or more active sites. (VI. TWO-STAGE UP-SAMPLING, p.2324; The second stage of up-sampling is performed for only the CTUs that have chosen the low-resolution coding mode, and the up-sampling method (CNN-based or DCTIF) is already decided in the first stage. The up-sampling result of the second stage just replaces that of the first stage.)
Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of modified Graham and the method Li in order to achieve higher reconstruction quality of image and simpler network structure (I. INTRODUCTION).

8.	Claim 8, 15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Graham, ("Sparse 3D convolutional neural networks." arXiv preprint arXiv: 1505.02890.), in view of Urtasun (US 20190146497), in further view of Wang (US 20190147335).

Regarding claim 8, modified Graham teaches the method of Claim 7, and Graham also teaches:
	where training the machine-learning model comprises: selecting one or more layers from the plurality of layers (1.3 Sparse operations; The spatial size of each of the CNN’s data layers is described by a lattice-type graph (similar to the ones in Figure 2). At each spatial location in the grid, there is a dimension-less vector of input or hidden units. 2.2 Object recognition; All the CNNs we tested took the form: 32C2 − pooling − 64C2 − pooling − 96C2 − ... − output. We rendered the 3D models at a variety of different scales, and varied the number of levels of pooling accordingly. We tried using MP3/2 pooling on the cubic and tetrahedral lattices. We also tried a stochastic form of max-pooling on the cubic lattice which we denote FMP[2]);
	But Graham does not explicitly teach inserting, for each of the selected layers, the one or more building blocks in between at least two of the plurality of network blocks associated with the layer.
	However, Wang teaches:
inserting, for each of the selected layers, the one or more building blocks in
between at least two of the plurality of network blocks associated with the layer (Paragraph 0121; One or more parametric continuous convolution layers may be used as building blocks to construct a family of deep neural networks which
operate on unstructured data defined in a topological group under addition.)
Further, it would have been obvious to one of ordinary skill in the art, prior to the
effective filing date, to combine the method of Graham and the method of Wang in order
to construct a family of deep neural networks which operate on unstructured data defined in a topological group under addition. (Paragraph 0121).

Regarding claim 15, modified Graham teaches the method of Claim 1, but modified Graham does not explicitly teach wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more softmax operations to the one or more active sites.
Wang, however teaches:
	wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more softmax operations to the one or more active sites (Paragraph 0145; The output of the concatenation component 772 is provided to a softmax component 774. The softmax component 774 can include a fully connected layer with softmax activation to produce a final output.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of modified Graham and the method of Wang in order to help convergence. (Paragraph 0129), and produce a final output that includes semantic labels for portions of the input sensor data based on the output of concatenation component (Paragraph 0145).

Regarding claim 17, modified Graham teaches the method of Claim 1, and Wang also teaches:
	further comprising: receiving a querying content object comprising a three-dimensional (3D) point cloud, wherein the 3D point cloud comprises a plurality of points; and determining, for each of the plurality of points, a part label based on the machine-learning model. (Paragraph 0145; the final output includes semantic labels for portions of the input sensor data based on the output of concatenation component 772. In some examples, a semantic label may be generated for each pixel or each point of a point cloud from the input data. The semantic label 776 can be provided to a cross entropy component 778 in one example. The cross entropy component 778 may apply back propagation to train the machine-learned model 701 in some implementations.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of modified Graham and Wang in order to predict physical objects in an environment external to the autonomous vehicle (Paragraph 0037).

Conclusion
9.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

10.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHAT MINH DANG whose telephone number is (571)272-8665. The examiner can normally be reached Monday - Friday 7:30am - 5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/P.M.D./Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121