DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments filed on 1/28/2021 with respect to claims 1 - 6, and 22-29, have been considered. Applicant’s amendment necessitated the new grounds of rejections as being presented below by introducing the new references of Erhan (US Patent 9,373,057 B1) and EL-Khamy (US PGPUB 2017/0344808 A1, hereinafter Khamy) and Vallespi (US PGPUB 2019/0171912 A1) as being explained below.
Applicant's arguments with respect to claims 7-13, have been fully considered but they are not persuasive. Regarding independent claim 7, Applicant argues that “Further, for at least some similar reasons as amended independent claim 1, amended independent claim 7 and each claim depending therefrom rejected under this section is also patentably distinguishable over the cited references. Thus, withdrawal of the rejection is respectfully requested” (please see Remarks, page 11, last paragraph). Examiner respectfully disagrees, as claim 7 does not recites similar limitations as presented in claim 1. Claim 7, is silent with respect to “representative of a probability that an object is depicted in the frame and that the cluster corresponds to the object in the frame” and “the cluster corresponds to the object in the frame” and “clustering the detected objects into one or more clusters based at least in part on distances between the locations” hence in view of broad claim 7 language it is Examiner position that the sensor data can include information that describes the location of objects within the surrounding environment of the autonomous vehicle 10” hence sensor data provided to first neural network 306 includes the location information of the object and the detected object (306 output) provided to second neural network 308. Further as previously cited by the Examiner that Frossard in paragraphs 52, 53 and 57, discloses “one or more sensors 101 can be used to collect sensor data that includes information that describes the location (e.g., in three-dimensional space relative to the autonomous vehicle 10) of points that correspond to objects within the surrounding environment of the autonomous vehicle 10” and also please see paragraph 73 which discloses “Object matching component 308 includes one or more second neural networks configured to receive object detections from object detection component 306, and provide a match score for each pair of object detections….” Hence here second neural network receives detected object data and provides pair of object detection (corresponds to clustering). Therefore Frossard in view of Vallespi reads on the argued limitations as presented by Applicant, Examiner suggests Applicant to further elaborate on how the clustering is being performed to overcome the cited references.



Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 25-26 and 28-29 is/are rejected under 35 U.S.C. 102(a) (1) as being anticipated by Erhan (US Patent 9,373,057 B1).

As per claim 25, Erhan discloses a computer-implemented system (Erhan, Figs. 1-3) comprising: 
one or more processors (Erhan, Column 7, lines 40-44); and 
one or more memory devices that store instructions that, when executed by the one or more processors, cause the one or more processors to execute operations (Erhan, Column 7, lines 40-51) comprising: 
determining, from detected objects in a field of view of at least one sensor (Erhan, Column 3, lines 31-32, discloses “object detection neural network 102 is a neural 
determining, from the plurality of detected objects, a bounding shape of the aggregated detection (Erhan, Fig. 3:302:304); 
determining a feature of the aggregated detection from the plurality of the detected objects for use as an input of a neural network (Erhan, Fig. 1:102, and Column 3, lines 31-41, and Column 6, lines 35-44); and 
receiving output data representative of a confidence score computed by the neural network based at least in part on the input, the confidence score representative of a probability that the bounding shape corresponds to an object within the field of view (Erhan, Column 2, lines 56-64, discloses “A neural network can be trained to effectively predict multiple bounding boxes in an input image, with the confidence score assigned to each bounding box by the neural network accurately reflecting the likelihood that the bounding box contains an image of an object.  Additionally, the neural network can be trained to predict the bounding boxes and generate accurate confidence scores while being agnostic to the object category that the objects contained in the bounding boxes belong to”).


As per claim 26, Erhan further discloses the system of claim 25, wherein each of the detected objects correspond to a same frame of the field of view and the probability is that the bounding shape corresponds to the object within the frame of the field of view (Erhan, Fig. 1:102, and Column 3, lines 31-34, discloses “object detection neural network 102 is a neural network that is configured to receive an input image and output bounding box data that defines a predetermined number of candidate bounding boxes within the input image.”).

As per claim 28, Erhan further discloses the system of claim 25, wherein the neural network is trained to minimize the probability when the bounding shape is a false positive and maximize the probability when the bounding shape is a true positive (Erhan, Column 2, lines 56-61, discloses “A neural network can be trained to effectively predict multiple bounding boxes in an input image, with the confidence score assigned to each bounding box by the neural network accurately reflecting the likelihood that the bounding box contains an image of an object…..”).

As per claim 29, Erhan further discloses the system of claim 25, wherein the feature comprises a count of the plurality of the detected objects that correspond to the aggregated detection (Erhan, Column 6, lines 35-44).




Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over EL-Khamy (US PGPUB 2017/0344808 A1, hereinafter Khamy) and Vallespi (US PGPUB 2019/0171912 A1).

As per claim 1, Khmay discloses a method (Khmay, Fig. 2:200, and Fig. 3:300) comprising: 
applying, to a first neural network (Khmay, Fig. 3:320), sensor data representative of a frame that depicts a field of view of at least one sensor of an environment (Khamy, paragraphs 50-51, discloses “The input interface 310 may receive a single input image…..The output of the input interface 310 may be coupled to an input of the 
face detection classification network 320.”); 

clustering the detected objects into one or more clusters based at least in part on distances between the locations (Khamy, paragraphs 25, 35 and 52-53); 
determining features of the cluster for use as inputs of a second neural network (Khamy, Fig. 3:370:380, and paragraph 55); 
receiving output data representative a confidence score computed by the second neural network based at least in part on the inputs (Khamy, paragraph 55), the confidence score representative of a probability that an object is depicted in the frame and that the cluster corresponds to the object in the frame (Khamy, paragraphs 28 and 55, discloses “The softmax layer 380 may compute a probability distribution for each bounding box over the class of persons used for training).
Although Khamy discloses in paragraph 36, “at a first stage the classification score at the first level may correctly identify an object (i.e., car, person, background), and at a second level may further classify the object class into a brand, and model (i.e., a car into a Ford Mustang and a person into a male or a female)”, however does not explicitly discloses sensor data representative of a vehicle in an environment, though said limitation would have been obvious in view of Khamy teachings from paragraph 36, further said limitation is also well known in the art for instance Vallespi discloses sensor data representative of a vehicle in an environment (Vallespi, Fig. 3, and paragraph 20).

The motivation would be to provide a system which enable improvements in safety through earlier and more accurate object detection (paragraph 50), as taught by Vallespi.

As per claim 2, Khamy in view of Vallespi discloses the method of claim 1, further comprising: determining at least a first detected object and a second detected object are a same object depicted across sequential frames represented by the sensor data (Khamy, paragraphs 28, and 36 discloses “Multiple keypoint detections of the same object may be addressed by treating the keypoints as different classes, posing a loss function to minimize a classification problem by assigning scores to each keypoint, and combining keypoints of the same class”); and 
computing at least one value of the same object based at least in part on the first detected object and the second detected object (Khamy, paragraph 28), wherein at least one of the features corresponds to the at least one value based at least in part on the cluster being associated with the same object (Khamy, paragraphs 28 and 55).

As per claim 3, Khamy in view of Vallespi discloses the method of claim 1, wherein the detected objects of the cluster comprise detected object regions (Khamy, paragraph 20, discloses “the face detection regression network 130 may use the features calculated for the face classification, and may operate on the regions that are 

As per claim 4, Khamy in view of Vallespi discloses the method of claim 1, wherein one or more of the features is based at least in part on vehicle state data representative of a state of the vehicle based at least in part on additional sensor data received from one or more of the at least one sensor or at least one alternative sensor (Vallespi, paragraphs 25, 62 and 73).

As per claim 5, Frossard in view of Vallespi discloses the method of claim 1, wherein the detected objects of the cluster comprise a detected object region (Khamy, paragraph 20), and one or more of the features is based at least in part on computing a statistic of one or more of input pixels to the first neural network (Khamy, paragraph 22) used to determine at least one of: the detected object data, or features of at least one layer of the first neural network (Khamy, paragraphs 20 and 22, discloses “The hierarchical convolutional features may be pixel-wise probabilities that form a feature map, or a heat map, of likelihoods that a given pixel is part of one or more objects of interest”).

As per claim 6, Frossard in view of Vallespi discloses the method of claim 1, wherein generating a cluster comprises clustering the detected objects based at least in part on coverage values of the detected objects (Khamy, paragraphs 27 and 32), each coverage value indicating a likelihood the detected object corresponds to an object .



Claims 7-13, and 23, is/are rejected under 35 U.S.C. 103 as being unpatentable over Frossard (US PGPUB 2019/0147610 A1) and Vallespi (US PGPUB 2018/0349746 A1).

As per claim 7, Frossard discloses a method comprising: 
determining, based at least in part on sensor data representative of a field of view of at least one sensor, detected object data representative of locations of detected objects in the field of view (Frossard, Fig. 3:306, and paragraphs 5, 52, 53, 57 and 70-73, discloses “receiving, by a computing system comprising one or more computing devices, sensor data from one or more sensors configured to generate sensor data associated with an environment”);
generating a cluster of the detected objects based at least in part on the locations (Frossard, paragraphs 52, 53 and 57, discloses “one or more sensors 101 can be used 
determining (features of) the cluster for use as inputs to a neural network (Frossard, Fig. 3:308, second neural network); and 
receiving output data representative of a confidence score computed by the neural network based at least in part on the inputs (Frossard, paragraphs 72 and 73), the confidence score representative of a probability that  an object at least partially depicted in the field of view and that the cluster corresponds to the object (Frossard, paragraphs 72 and 73).
Although Frossard does not explicitly discloses determining features for the cluster (for use as inputs of a second neural network); however above feature would have been obvious in view of Frossard teachings from paragraphs 57, 94 and 95. Furthermore determining features for the cluster is well known in the art for instance Vallespi discloses determining features for the cluster (for use as inputs of a second neural network) (Vallespi, paragraphs 33, 35 and 92, discloses features determination for cluster).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Frossard teachings by implementing a feature extraction unit to the system, as taught by Vallespi.
The motivation would be to provide a system of an autonomous vehicle with improved object classification and detection (paragraph 46), as taught by Vallespi.

As per claim 8, Frossard in view of Vallespi discloses the method of claim 7, wherein the neural network is a multi- layer perceptron neural network (Khamy, paragraph 26).

As per claim 9, Frossard in view of Vallespi discloses the method of claim 7, wherein the locations of the object detections are represented by outputs of a convolutional neural network that determines the locations based at least in part on the sensor data (Frossard, paragraphs 52, 53 and 57).

As per claim 10, Frossard in view of Vallespi discloses the method of claim 7, wherein the at least one sensor is of a vehicle and one or more of the features is based at least in part on distance data representative of a distance of the vehicle from the object (Vallespi, paragraphs 47 and 82), the distance data based at least in part on additional sensor data received from one or more of the at least one sensor or at least one alternative sensor of the vehicle (Vallespi, paragraph 82, LIDAR).

As per claim 11, Frossard in view of Vallespi discloses the method of claim 7, wherein at least one of the features is based at least in part on coverage values of the detected objects of the cluster, each coverage value indicating, for a detected object, a likelihood the detected object corresponds to an object depiction in the field of view (Vallespi, paragraph 34, discloses “the classification for each cell can include a 

As per claim 12, Frossard in view of Vallespi discloses the method of claim 7, wherein at least one of the features is based at least in part on one or more of a height of a detected object region that corresponds to the detected objects of the cluster, a width of the detected object region, a central location of the detected object region, or a number of the detected objects of the cluster (Vallespi, paragraph 89).

As per claim 13, Frossard in view of Vallespi discloses the method of claim 7, wherein one or more of the features is based at least in part one at least one estimated parameter of a ground plane in the field of view (Vallespi, paragraphs 61 and 114).

As per claim 23, Frossard in view of Vallespi discloses the method of claim 7, wherein the generating the cluster of the detected objects comprises clustering the detected objects into a plurality of clusters using a clustering algorithm that is based at least on similarities between the locations and the cluster is of the plurality of clusters (Vallespi, paragraphs 36, 38 and 58).





Claims 22, and 24, is/are rejected under 35 U.S.C. 103 as being unpatentable over Frossard (US PGPUB 2019/0147610 A1) and Vallespi (US PGPUB 2018/0349746 A1) and further in view of EL-Khamy (US PGPUB 2017/0344808 A1, hereinafter Khamy).

As per claim 22, Frossard in view of Vallespi discloses the method of claim 7, wherein the plurality of detected objects of the cluster are detected in a same frame representing the field of view.
Frossard in view of Vallespi does not explicitly discloses plurality of detected objects of the cluster are detected in a same frame representing the field of view.
Khamy discloses plurality of detected objects of the cluster are detected in a same frame representing the field of view (Khamy, Fig. 3:340:350, and paragraphs 52-53).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Frossard in view of Vallespi teachings by implementing ROI pooler to the system, as taught by Khamy.
The motivation would be to provide an improved system with increased computational speed and freeing hardware resources (paragraph 17), as taught by Khamy.

As per claim 24, Frossard in view of Vallespi discloses the method of claim 7, wherein the neural network is a second neural network (Frossard, Fig. 3:308, second neural network) and the detected object data is output from a first neural network (Frossard, Fig. 3:306) Frossard in view of Vallespi does not explicitly discloses (first 
Khamy discloses (first neural network) that is trained to output multiple detected objects for a same object within a frame of input data (Khamy, paragraphs 28, 36 and 51).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Frossard in view of Vallespi teachings by training the CNN, as taught by Khamy.
The motivation would be to provide an improved system with increased computational speed and freeing hardware resources (paragraph 17), as taught by Khamy.


Claims 27, is/are rejected under 35 U.S.C. 103 as being unpatentable over Erhan (US Patent 9,373,057 B1) and Karasev (US PGPUB 2019/0147600 A1).

As per claim 27, Erhan further discloses the system of claim 25, wherein the feature comprises Erhan does not explicitly disclose one or more dimensions of the bounding shape and the one or more dimensions are computed from dimensions of bounding shapes of the plurality of the detected objects.
Karasev discloses one or more dimensions of the bounding shape and the one or more dimensions are computed from dimensions of bounding shapes of the plurality of the detected objects (Karasev, paragraphs 31, 39 and 49).

The motivation would be to improve accuracy and performance of the system (paragraph 61), as taught by Karasev.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SYED Z HAIDER whose telephone number is (571)270-5169.  The examiner can normally be reached on MONDAY-FRIDAY 9-5:30 EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SAM K Ahn can be reached on 571-272-3044.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SYED HAIDER/Primary Examiner, Art Unit 2633