DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
2.	The information disclosure statement (IDS) submitted on 04/09/2019, and 01/07/2021 are being considered by the examiner.

Claim Rejections - 35 USC § 103
3.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
4.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5.	Claim 1-11, and 14-19 are rejected under 35 U.S.C. 103 as being unpatentable over Graham, ("Sparse 3D convolutional neural networks." arXiv preprint arXiv:1505.02890.), in view of Wang (US 20190147335).

Regarding claim 1, Graham teaches:
A method comprising, by one or more computing systems: accessing a plurality of content objects; (2.2 Object recognition; we used a dataset of 3D objects2
, each stored as a mesh of triangles in the OFF-file format. The dataset contains 1200 exemplars split evenly between 50 classes (aliens, ants, armadillo ...).);
	generating a plurality of voxelized representations for the plurality of content objects, respectively (Figure 4; Items from the 3D object dataset used in Section 2.2, embedded into a 40 × 40 × 40 cubic grid. Top row: four items from the snake class. Bottom row: an ant, an elephant, a robot and a tortoise.);
	determining, based on the voxelized representation of each of the plurality of content objects, one or more active sites for each of the plurality of content objects (1.3 Sparse operations; Calculating a 2×2 convolution for a sparse CNN: On the left is a 6×6 square grid with 3 active sites. The convolutional filter needs to be calculated at each location that covers at least on active site; this corresponds to the shaded region. The figure on the right marks the location of the eight active sites in the 5 × 5 output layer. Page 5); and
applying, to the one or more active sites, the one or more sparse convolutions (1.3 Sparse operations, Figure 3; Calculating a 2×2 convolution for a sparse CNN: On the left is a 6×6 square grid with 3 active sites.).
But Graham does not explicitly teach generating, based on one or more sparse convolutions, one or more building blocks and training a machine-learning model based on a convolutional network, wherein the convolutional network comprises the one or more building blocks.
	However, Wang teaches:
	generating, based on one or more sparse convolutions, one or more building blocks (Paragraph 0122; In some examples, the building blocks including parametric continuous convolution layers can be differentiable operators, such that the networks can be learned through back propagation, according to Equation 6. Paragraph 0132; The set of input point locations 712 and the set of input point features 714 are provided to a sparse indexing component 734. Additionally, the support point indices 724 are provided from K-dimensional tree component 722 to sparse indexing component 734. Sparse indexing component 734 generates a first output including a set of support point locations.); and 
training a machine-learning model based on a convolutional network, wherein the convolutional network comprises the one or more building blocks (Wang, Paragraph 0122; In some examples, the building blocks including parametric continuous convolution layers can be differentiable operators, such that the networks can be learned through back propagation, according to Equation 6. Paragraph 0132; The set of input point locations 712 and the set of input point features 714 are provided to a sparse indexing component 734. Additionally, the support point indices 724 are provided from K-dimensional tree component 722 to sparse indexing component 734. Sparse indexing component 734 generates a first output including a set of support point locations (e.g., support point coordinates).).
Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and the method of Wang in order to construct a family of deep neural networks which operate on unstructured data defined in a topological group under addition. (Paragraph 0121).

Regarding claim 2, modified Graham teaches the method of Claim 1, and Wang also teaches:
	wherein each of the plurality of content objects comprises a three- dimensional (3D) point cloud comprising a plurality of points. (Paragraph 0095; the input data may include LIDAR imagery produced by a LIDAR system, such as a three-dimensional point clouds, where the point clouds can be highly sparse. By way of example, a 3D point cloud can describe the locations of detected objects in three-dimensional space. For many locations in the three-dimensional space, there may not be an object detected at such location. Additional examples of sensor data include imagery captured by one or more cameras or other sensors include, as examples: visible spectrum imagery (e.g., humanly-perceivable wavelengths); infrared imagery; imagery that depicts RADAR data produced by a RADAR system; heat maps; data visualizations; or other forms of imagery.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and Wang because this permits the continuous convolutional neural network to generate output features in association (Paragraph 0028).

Regarding claim 3, modified Graham teaches the method of Claim 2, and Wang also teaches:
	wherein generating the voxelized representation for each content object comprises: determining, for the 3D point cloud, one or more voxels, wherein each voxel comprises one or more points. (Paragraph 0036; voxel-wise predictions can be mapped back to original points and metrics computed over points.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and Wang because this permits the continuous convolutional neural network to generate output features in association with output points in an output domain for which there is no corresponding input feature in a support domain. (Paragraph 0028).

Regarding claim 4, modified Graham teaches the method of Claim 2, and Wang also teaches:
	wherein the content object comprises one or more parts, and wherein one or more points of the plurality of points are associated with a part label corresponding to one of the one or more parts (Paragraph 0095; the input data may include LIDAR imagery produced by a LIDAR system, such as a three-dimensional point clouds, where the point clouds can be highly sparse. By way of example, a 3D point cloud can describe the locations of detected objects in three-dimensional space. For many locations in the three-dimensional space, there may not be an object detected at such location. Paragraph 0145; the final output includes semantic labels for portions of the input sensor data based on the output of concatenation component 772.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and Wang in order to predict physical objects in an environment external to the autonomous vehicle (Paragraph 0037).

Regarding claim 5, modified Graham teaches the method of Claim 1, and Graham also teaches:
	wherein the convolutional network is based on a three-dimensional architecture (2 Experiments ; We have performed experiments to test triangular and sparse 3D CNNs.).
	
Regarding claim 6, modified Graham teaches the method of Claim 1, and Graham also teaches:
	wherein each of the one or more sparse convolutions correlates the one or more active sites with one or more output based on one or more filters and one or more strides (1.1 Adding a dimension to 2D; To apply a 7×7×7 convolutional filter with the same stride. 1.2 CNNs on different lattices; The n counts the number of convolutional filters… on the square, cubic, triangular and tetrahedral lattices, respectively—and s denotes the stride.).


	wherein the convolutional network comprises a plurality of layers, each layer comprising a plurality of network blocks (2.1 Square versus triangular 2D convolutions;  Both networks have 12 small convolutional layers split into pairs by 5 layers of max-pooling, and with the n-th pair of convolutional filters each having 32n output features. Note: Network blocks could describe a single layer, a component consisting of multiple layers, or the entire model itself.)

Regarding claim 8, modified Graham teaches the method of Claim 7, and Graham also teaches:
	where training the machine-learning model comprises: selecting one or more layers from the plurality of layers (1.3 Sparse operations; The spatial size of each of the CNN’s data layers is described by a lattice-type graph (similar to the ones in Figure 2). At each spatial location in the grid, there is a dimension-less vector of input or hidden units. 2.2 Object recognition; All the CNNs we tested took the form: 32C2 − pooling − 64C2 − pooling − 96C2 − ... − output. We rendered the 3D models at a variety of different scales, and varied the number of levels of pooling accordingly. We tried using MP3/2 pooling on the cubic and tetrahedral lattices. We also tried a stochastic form of max-pooling on the cubic lattice which we denote FMP[2]);
	But Graham does not explicitly teach inserting, for each of the selected layers, the one or more building blocks in between at least two of the plurality of network blocks associated with the layer.

	inserting, for each of the selected layers, the one or more building blocks in between at least two of the plurality of network blocks associated with the layer (Paragraph 0121; One or more parametric continuous convolution layers may be used as building blocks to construct a family of deep neural networks which operate on unstructured data defined in a topological group under addition.)
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and the method of Wang in order to construct a family of deep neural networks which operate on unstructured data defined in a topological group under addition. (Paragraph 0121).

Regarding claim 9, modified Graham teaches the method of Claim 1, and Wang also teaches:
	wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more activation functions to the one or more active sites (Paragraph 0144; The first continuous convolution layer 752 performs one or more continuous convolutions using the input sensor data and the supporting point indices. Each continuous convolution layer except for the final continuous convolution layer of the set has 32 dimensional hidden features followed by batch normalization and ReLU nonlinearity.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and the method of Wang in order to help convergence. (Paragraph 0129).

Regarding claim 10, modified Graham teaches the method of Claim 1, and Wang also teaches:
	wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more batch normalizations to the one or more active sites (Paragraph 0144; The first continuous convolution layer 752 performs one or more continuous convolutions using the input sensor data and the supporting point indices. Each continuous convolution layer except for the final continuous convolution layer of the set has 32 dimensional hidden features followed by batch normalization and ReLU nonlinearity.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and the method of Wang in order to help convergence. (Paragraph 0129).

Regarding claim 11, modified Graham teaches the method of Claim 1, and Graham also teaches:
	wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more downsampling operations to the one or more active sites, wherein each downsampling operation comprises one or more of pooling or strided convolution, and wherein each pooling comprises one or more of max pooling or average pooling (2.2 Object recognition; We also tried a stochastic form of max-pooling on the cubic lattice which we denote FMP;we used FMP to downsample the hidden layer


Regarding claim 14, modified Graham teaches the method of Claim 1, and Graham also teaches:
	wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more linear operations to the one or more active sites (1.3 Sparse operations; this can both dramatically reduce memory requirements and speed up operations such as matrix multiplication. Calculate Mout = Q × W + B).

Regarding claim 15, modified Graham teaches the method of Claim 1, and Wang also teaches:
	wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more softmax operations to the one or more active sites (Paragraph 0145; The output of the concatenation component 772 is provided to a softmax component 774. The softmax component 774 can include a fully connected layer with softmax activation to produce a final output.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and the method of Wang in order to help convergence. (Paragraph 0129), and produce a final output that includes semantic labels for portions of the input sensor data based on the output of concatenation component (Paragraph 0145).


	further comprising: generating one or more hash tables and one or more rule books, wherein the one or more hash tables comprise location information associated with a plurality of active sites of the plurality of content objects, and wherein the one or more rule books comprise a plurality of input-out pairs associated with the plurality of active sites, the input-output pairs being determined based on the one or more sparse convolutions (1.3 Sparse operations;  A matrix Min with size ain × nin. Each row corresponds to the vector at one of the active spatial locations. A map or hash table Hin of (key,value) pairs. The keys are the active spatial locations. The values record the number of the corresponding row in Min. Use Hin, Min and gin to build a matrix Q of size aout × (f2nin); each row of Q should correspond to the inputs visible to the convolutional filter at the corresponding output spatial location. Q identifies input-output pairs (inputs visible…at corresponding output spatial location.)

Regarding claim 17, modified Graham teaches the method of Claim 1, and Wang also teaches:
	further comprising: receiving a querying content object comprising a three-dimensional (3D) point cloud, wherein the 3D point cloud comprises a plurality of points; and determining, for each of the plurality of points, a part label based on the machine-learning model. (Paragraph 0145; the final output includes semantic labels for portions of the input sensor data based on the output of concatenation component 772. In some examples, a semantic label may be generated for each pixel or each point of a point cloud from the input data. The semantic label 776 can be provided to a cross entropy component 778 in one example. The cross entropy component 778 may apply back propagation to train the machine-learned model 701 in some implementations.).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and Wang in order to predict physical objects in an environment external to the autonomous vehicle (Paragraph 0037).

Regarding Claim 18, it is substantially similar to Claim 1 and is rejected in the same manner, the same art and reasoning applying. Further, Wang also teaches one or more computer-readable non-transitory storage media embodying software that is operable when executed to (Claim 1; one or more non-transitory computer-readable media that store instructions that, when executed by the one or more processors).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of Graham and Wang in order to perform operations of the autonomous vehicle (Claim 1).
	
Regarding Claim 19, it is substantially similar to Claim 1 and is rejected in the same manner, the same art and reasoning applying. Further, Wang also teaches A system comprising: one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to: access a plurality of content objects (Claim 19; A computing system, comprising: one or more processors; one or more non-transitory computer-readable media that store: a machine-learned neural network that comprises one or more fusion layers, wherein at least one of the one or more fusion layers is configured to fuse input data of different modalities; instructions that, when executed by the one or more processors, cause the computing system to perform operations).
	Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the system of Graham and the system Wang in order to perform operations of the autonomous vehicle (Claim 1).
	
6.	Claim 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Graham, ("Sparse 3D convolutional neural networks." arXiv preprint arXiv:1505.02890.), in view of Wang (US 20190147335), in further view of Li, et al. ("Convolutional neural network-based block up-sampling for intra frame coding." IEEE Transactions on Circuits and Systems for Video Technology 28.9).

Regarding claim 12, modified Graham teaches the method of Claim 1 and the one or more active site (1.3 Sparse operations; determine the number aout of active spatial locations in the output layer.), but modified Graham does not explicitly teach wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more deconvolution operations to the one or more active sites.
However, Li teaches:
wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more deconvolution operations to the one or more active sites (IV. CNN-BASED UP-SAMPLING, A. CNN for Luma Up-Sampling, 2) Deconvolution, p.2320; the deconvolution layer is used to enlarge the multiscale feature maps and the enlarged features are then used to reconstruct HR image. “As shown in Fig. 2, the third layer performs deconvolution of the multi-scale feature maps extracted by the second layer.”)
Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of modified Graham and the method Li in order to enlarge the multiscale feature maps and the enlarged features are then used to reconstruct HR image (IV. CNN-BASED UP-SAMPLING , A. CNN for Luma Up-Sampling, 2) Deconvolution, p.2320).

Regarding claim 13, modified Graham teaches the method of Claim 1, but does not explicitly teach wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more upsampling operations to the one or more active sites.
However, Li teaches:
wherein training the machine-learning model comprises: applying, for each of the plurality of content objects, one or more upsampling operations to the one or more active sites. (VI. TWO-STAGE UP-SAMPLING, p.2324; The second stage of up-sampling is performed for only the CTUs that have chosen the low-resolution coding mode, and the up-sampling method (CNN-based or DCTIF) is already decided in the first stage. The up-sampling result of the second stage just replaces that of the first stage.)
Further, it would have been obvious to one of ordinary skill in the art, prior to the effective filing date, to combine the method of modified Graham and the method Li in  (I. INTRODUCTION).

Conclusion
7.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHAT MINH DANG whose telephone number is (571)272-8665. The examiner can normally be reached Monday - Friday 7:30am - 5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 





/P.M.D./Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121