DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-15 are pending in the application.

Claim Objections
Claim 1 (2nd line from bottom) “the processor” has no antecedent basis.
Claim 5 “the second portion” has no antecedent basis.

Comments on Claim Interpretation
Claim 1 is reproduced in the following (annotation added).
1. A method of building a depth-based object-detection convolutional neural network, comprising: 
storing, in a memory, data and program instructions corresponding to a convolutional neural network, including: 
a base network configured to receive RGB image data as input and compute output data indicative of at least one feature of an object in the received RGB image data, the base network pre-trained to compute feature detections using an RGB image dataset; and 
additional structure configured to receive the output data of the base network as input and compute predictions of a location of a region in the received RGB image that includes the object and of a class of the object, such that the object detection convolutional neural network is configured to receive RGB test image data as input and compute the predictions as output; 
storing, in the memory, a dataset of training depth images, each including at least one annotation that localizes a region of a respective depth image as containing a training object and identifies a class of the training object; 
generating a training dataset for the object detection convolutional neural network by reformatting each image in the dataset of training depth images as an RGB image; and 
training the object detection convolutional neural network with the processor using the training dataset to form a depth-based object-detection convolutional neural network configured to receive a depth image formatted as RGB image data as input and compute predictions of a location of a region in the received depth image that includes a test object and of a class of the test object as output.

	Claim 1 recites a convolutional neural network (underlined part), which includes a base network and additional structure. Claim 1 further recites training a depth-based object-detection convolutional neural network (bold text). It is noticed that there is no connection between the two neural networks although they looks very similar in structure. 
	Claim 10 is a corresponding system claim with respect to method claim 1.
	Claim 11 recites a system for object localization and classification using a trained depth-based object-detection convolutional neural network. Claim 12 further details the depth-based object-detection convolutional neural network, i.e., comprising a base network and additional structure.

	Examiner believes that the above claims do not reflect the inventive subject matter disclosed in the specification. The current application is directed to training (or inference) a depth-based object-detection convolutional neural network using pre-trained RGB-based object-detection CNN. The pre-trained RGB-based object-detection CNN, such as ZFNet, ThiNet, or VGG16, is trained based on large amount of training data. Reformatting depth data into RGB format as input and retraining the pre-trained RGB-based object-detection CNN obtains a trained depth-based object-detection CNN. By doing so, the pre-trained RGB CNN is transferred into another domain (e.g. depth) with efficiency, such as not requiring large amount of training data.
  	Claim 1 recites “reformatting each image in the dataset of training depth images as an RGB image”. This does not necessarily create a link between the two CNNs. 
 	Similarly, in claim 11 the phrase “reformat the depth image data as RGB image data” does not mean there exists a pre-trained RGB-based CNN as in claim 12. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Saleh et al. (Saleh, k., et al., “Cyclist Detection in LIDAR Scans Using Faster R-CNN and Synthetic Depth Images”, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), October 16, 2017, hereafter Saleh), in view of Sharma et al. (US Publication 2017/0032222 A1, hereafter Sharma ).
		As per claim 1, Saleh teaches the invention substantially as claimed including a method (Abstract) of building a depth-based object-detection convolutional neural network, comprising: 
storing, data corresponding to a convolutional neural network, including: 
a base network configured to receive RGB image data as input and compute output data indicative of at least one feature of an object in the received RGB image data, the base network pre-trained to compute feature detections using an RGB image dataset; and 
additional structure configured to receive the output data of the base network as input and compute predictions of a location of a region in the received RGB image that includes the object and of a class of the object, such that the object detection convolutional neural network is configured to receive RGB test image data as input and compute the predictions as output 
(Saleh first introduces a general convolutional neural network (ConvNets) (page 2 subsection B), then goes to a state-of-the-art ConvNet architecture for object detection, Region-based convolutional neural networks (R-CNN). R-CNNs consists of two modules that constitute a unified single generic object detection network trainable in end-to-end fashion. The first module (corresponding to the recited “base network”) is the region proposal module which provides a set of regions of interest (RoIs) in the input image (corresponding to extracted features), which is typically a set of rectangular windows. The resultant RoIs are then fed into the second module (corresponding to the recited “additional structure”) which is a fully convolutional neural network in which the spatial size of the RoIs are reduced into a smaller feature maps using a max RoI pooling layer. The RoI layer feature maps are then flattened into a feature vector by fully connected layers (FC). In the last layer of the R-CNN a multi-task loss layer exists as the training objective function, followed by a softmax and regression layers for object class score (corresponding to the recited “class of object) and the bounding box positions (corresponding to the recited “location of a region”) respectively in the input image (page 3 subsection C first para.). The R-CNN Saleh chooses is a Faster R-CNN (page 3 left col. subsection C 2nd para.) Saleh further teaches that the input to the regular ConvNets is RGB image data (page 3 right col. last 5 line lines)),
storing, a dataset of training depth images, each including at least one annotation that localizes a region of a respective depth image as containing a training object and identifies a class of the training object (page 2 left col. 2nd para. “Faster R-CNN is trained on labelled synthetic depth images with bounding boxes of cyclists in each depth image”; page 3 right col. subsection A teaching generating synthetic depth images and labelling ground truth bounding boxes); 
generating a training dataset for the object detection convolutional neural network by reformatting each image in the dataset of training depth images as an RGB image (Saleh uses pre-trained Faster R-CNN as a base to train a cyclist detection Faster R-CNN based on depth data. In order to adapt to the pre-trained model, Saleh colorizes single channel depth images and converts them into the same format as RGB image data. See Fig. 2; page 3 right col. last 8 lines to page 4 left col. line 6; page 4 left col. section IV. 1st para. “The input to the architecture will be a colourised depth image computed according to procedure discussed Section III-B, with a resolution of 640W ×480H×3”); and 
training the object detection convolutional neural network using the training dataset to form a depth-based object-detection convolutional neural network configured to receive a depth image formatted as RGB image data as input and compute predictions of a location of a region in the received depth image that includes a test object and of a class of the test object as output (Fig. 3; pages 4-5 section IV for training; page 5 section V showing results: using 8k depth images for training and 2K for testing; Fig. 4 showing detection result (bounding boxes representing locations and a binary value representing class (page 5 left col. line 6 “two binary classes (either an object or not))).
	Saleh teaches every limitation as recited in claim 1 except for some computing elements, i.e., memory, program instructions and processor.
	Sharma in the same field of endeavor discloses a method for object detection and classification using CNN based on depth data (FIG. 2; para. [0062]). Sharma discloses computing elements, such as memory, program instructions and processor (FIG. 3; para. [0051]-[0053]).
Taking the combined teachings of Saleh and Sharma as a whole, it would have been obvious for a person with ordinary skill in the art before the effective filing date of the claimed invention to consider using a computing system comprising computing elements as disclosed by Sharma in order to implement a computer-implemented method for object detection and classification. 

Claim 10, an independent system, is rejected as applied above to method claim 1.

As per claim 11, an independent claim,  Saleh in view of Sharma teaches an object detection system (Sharma FIG. 3) for localizing an object within a scene and identifying a classification for the object, the device comprising: 
a data storage device (Sharma FIG. 3) that stores data and program instructions corresponding to a depth-based object-detection convolutional neural network, the depth-based object-detection convolutional neural network configured to receive a depth image formatted as RGB image data as input and compute predictions of a location of a region in the received depth image that includes a test object and of a class of the test object as output (See rejections applied to claim 1); 
a depth sensor (Sharma para. [0041], [0062]) configured to sense distances to various points on surfaces in a scene and compute depth image data with reference to the sensed distances (Sharma para. [0041], [0062]); 
an output component (Sharma FIG. 3); and 
a processor (Sharma FIG. 3) operatively connected to the data storage device, the depth sensor, and the output component, and configured to: 
operate the depth sensor to collect depth image data (Sharma para. [0041], [0062]); 
reformat the depth image data as RGB image data; 
determine a prediction of a location of a region in the depth image data that includes the object and identify a classification for the object by feeding the corresponding RGB image data through the depth-based object-detection convolutional neural network; and 
output the location and classification of the object via the output component (See rejections applied to claim 1).

As per claim 12, dependent upon claim 11, Saleh in view of Sharma teaches the depth-based object-detection convolutional neural network includes: 
a base network configured to receive a depth image formatted as RGB image data as input and compute output data indicative of at least one feature of an object in the RGB image data; and 
additional structure configured to receive the output data of the base network as input and compute predictions of the location of a region in the received depth image that includes the test object and of the class of the test object as output (See rejections as applied to claim 1 above).

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Saleh et al. (Saleh, k., et al., “Cyclist Detection in LIDAR Scans Using Faster R-CNN and Synthetic Depth Images”, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), October 16, 2017, hereafter Saleh), in view of Sharma et al. (US Publication 2017/0032222 A1, hereafter Sharma ), as applied above to claim 1, and further in view of Jiang et al. (Jiang, L., et al., “Self-Paced Curriculum Learning”, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Jan., 2015, listed in IDS, hereafter Jiang).
As per claim 2, Saleh in view of Sharma does not teach the claimed limitations.
Jiang discloses a self-paced curriculum learning method (Abstract). Jiang teaches a complexity metric for each training sample, the complexity metric indicative of a feature complexity of the respective sample, and inputting the training samples to a training model according to a curriculum that orders samples by ascending complexity metric (page 1 left col. last 3 lines: “A curriculum determines a sequence of training samples which essentially corresponds to a list of samples ranked in ascending order of learning difficulty”).
Taking the combined teachings of Saleh, Sharma and Jiang as a whole, it would have been obvious for a person with ordinary skill in the art before the effective filing date of the claimed invention to consider providing each training sample a complexity and ranking the samples as disclosed by Jiang in order to establish a learning curriculum that can efficiently train a model. 

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Saleh et al. (Saleh, k., et al., “Cyclist Detection in LIDAR Scans Using Faster R-CNN and Synthetic Depth Images”, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), October 16, 2017, hereafter Saleh), in view of Sharma et al. (US Publication 2017/0032222 A1, hereafter Sharma ), as applied above to claim 1, and further in view of Park (US Publication 2020145661 A1, hereafter Park).
As per claim 4, Saleh in view of Sharma does not teach the recited limitations, especially the filters.
Park discloses a convolutional neural network with a plurality of layers. Specifically, the CNN includes a plurality of convolutional filters distributed throughout a sequence of layers (FIG. 5C; para. [0123]-[0125]); 
each convolutional filter in the plurality of convolutional filters is configured to receive input data and compute convolutional output data by convolving a respective matrix of weights over the input data (FIG. 5C; para. [0123]-[0125]); and
each layer in the sequence of layers is configured to receive input data and compute output data formed by a combination of the output data of each filter in the layer (FIG. 5C; para. [0123]-[0125]).
Saleh further teaches that the base network includes a first subset of the sequence of layers; and the additional structure includes a second subset of the sequence of layers (page 2 right col. last para.; page 3 left col. subsection C 1st para.)
Taking the combined teachings of Saleh, Sharma and Park as a whole, it would have been obvious for a person with ordinary skill in the art before the effective filing date of the claimed invention to consider including filters and performing operation of each layer as performed by Park in order to extract features throughout the plurality of layers of a CNN. 

Claims 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over Saleh et al. (Saleh, k., et al., “Cyclist Detection in LIDAR Scans Using Faster R-CNN and Synthetic Depth Images”, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), October 16, 2017, hereafter Saleh), in view of Sharma et al. (US Publication 2017/0032222 A1, hereafter Sharma ), as applied above to claim 11, and further in view of OH (US Publication 2014/0063026 A1).
As per claim 13, Saleh in view of Sharma does not teach the claimed limitations.
OH is evidenced that a system-on-a-chip that includes a central processing unit and a graphics processing unit embedded with the central processing unit is well-known and practiced (para. [0019]).
Taking the combined teachings of Saleh, Sharma and OH as a whole, it would have been obvious for a person with ordinary skill in the art before the effective filing date of the claimed invention to consider including a system-on-a-chip as disclosed by OH in order to make a compact learning system. 

As per claim 14, dependent upon claim 13, Saleh in view of Sharma and OH teaches that the processor is configured to operate the depth sensor to collect depth image data (Sharma FIG. 3; para. [0051]-[0053]; para. [0062]) and operate the output component to output the location and classification of the object via the output component at a rate in excess of 1 frame per second (Sharma para. [0062] “video frames”).

As per claim 15, dependent upon claim 14, Saleh in view of Sharma and OH teaches the processor is configured to determine prediction of locations of a plurality of objects in the image data and a respective identification for each object (Saleh Fig. 1 lower panel showing two bounding boxes).

Allowable Subject Matter
Claims 3 and 5-9 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XUEMEI G CHEN whose telephone number is (571)270-3480.  The examiner can normally be reached on Monday-Friday 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on 571-272-7882.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/XUEMEI G CHEN/Primary Examiner, Art Unit 2664