DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 14-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because Applicant has provided evidence that Applicant intends the term “computer readable storage medium” to include non-statutory matter. Applicant describes a computer-readable storage medium as including open ended language and thus it is reasonable to interpret it to include all possible mediums, including non-statutory mediums (see paragraph [0015] of the instant specification “The method 40 may be implemented in one or more modules as a set of logic instructions stored in a machine- or computer-readable storage medium such as random access memory (RAM), read only memory (ROM), programmable ROM (PROM), firmware, flash memory, etc.,…”). The words “storage” and/or “recording” are insufficient to convey only statutory embodiments to one of ordinary skill in the art absent an explicit and deliberate limiting definition or clear differentiation between storage media and transitory media in the disclosure. As such, the claim(s) is/are drawn to a form of energy. Energy is not one of the four categories of invention and therefore this/these claim(s) is/are not statutory. Energy is not a series of steps or acts and thus is not a process. Energy is not a physical article or object and as such is not a machine or manufacture. Energy is not a combination of substances and therefore not a composition of matter.
Furthermore, in determining whether the claims are subject matter eligible, the Examiner applies the 2019 USPTO Patent Eligibility Guidelines. (2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50, Jan. 7, 2019.)
Step 1: Is the claim to a process, machine, manufacture, or composition of matter? No, because Applicant has provided evidence that Applicant intends the term “computer readable storage medium” to include non-statutory matter. Applicant describes a computer-readable storage medium as including open ended language and thus it is reasonable to interpret it to include all possible mediums, including non-statutory mediums (see above).
Dependent claims 15-19 are rejected for containing the same non-statutory subject matter of independent base claim 14 upon which claims 15-19 depend.
The Examiner suggests amending the claim(s) to instead read as a “non-transitory computer-readable storage medium.”

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-25 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Zeng et al., *US 2019/0361454 A1, hereinafter Zeng).
Regarding claims 1, 7, 14, and 20, taking claim 1 as exemplary:
Zeng shows:
“A computing system comprising: a network controller to receive a first video frame and a second video frame that is subsequent to the first video frame;” (Paragraph [0114]: “The feature extraction CNN 130 receives the sensor data 129 as an input layer 222. The sensor data 129 can include image data 212 and range point data 214. The image data 212 can include an image that includes pixel information or data (e.g., pixels) obtained via cameras.” In paragraph [0060]: “In various embodiments, the operating environment 50 further includes one or more user devices 54 that communicate with the autonomous vehicle 10 and/or the remote transportation system 52 via a communication network 56.” In paragraph [0061]: “The communication network 56 supports communication as needed between devices, systems, and components supported by the operating environment 50 (e.g., via tangible communication links and/or wireless communication links).” paragraph [0068]: “In accordance with various embodiments, the controller 34 implements a high-level controller of an autonomous driving system (ADS) 33 as shown in FIG. 3. That is, suitable software and/or hardware components of the controller 34 (e.g., the processor 44 and the computer-readable storage device 46) are utilized to provide a high-level controller of an autonomous driving system 33 that is used in conjunction with vehicle 10. The high-level controller of the autonomous driving system 33 will be described in greater detail below with reference to FIGS. 4 and 5” in paragraph [0114]: “The feature extraction CNN 130 receives the sensor data 129 as an input layer 222. The sensor data 129 can include image data 212 and range point data 214. The image data 212 can include an image that includes pixel information or data (e.g., pixels) obtained via cameras.” and in paragraph [0116]: “The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle. The feature extraction CNN 130 concatenates each feature layer 232 with a previous feature layer 236 and a range presence map 238 to generate and output the feature map 132. The feature map 132 is the concatenated layers from feature layer 232, the previous feature layer 236, and the range presence map 238. In other words, the concatenation of range presence map 238, the current vision-based feature map 232 and a previous vision-based feature map 236 from a previous time instant form the whole feature map 132.” In paragraph [0117]: “The perception map generator module 134 generates the perception map 141 based on the feature map 132. The perception map is a human-readable representation of the driving environment that includes scenes being acquired via the sensor system 128 at any given instant. As will be described below, the perception map 141 includes multiple elements including: object (bounding boxes) locations, orientations, velocities (represented by 141-A); a freespace grid or image segmentation of freespace (represented by 141-B); road feature locations/types (represented by 141-C); and stixels (represented by 141-D).” – The controller for autonomous driving that is connected to a communications network and remote transportation system including a network based system for operating to autonomous driving remotely by collecting camera data at different time instances to detect objects of Zeng is the network controller to receive a first video frame and a second video frame that is subsequent to the first video frame.
“a processor coupled to the network controller;” (Paragraph [0060]: “In various embodiments, the operating environment 50 further includes one or more user devices 54 that communicate with the autonomous vehicle 10 and/or the remote transportation system 52 via a communication network 56.” In paragraph [0061]: “The communication network 56 supports communication as needed between devices, systems, and components supported by the operating environment 50 (e.g., via tangible communication links and/or wireless communication links).” In paragraph [0065]: “The remote transportation system 52 includes one or more backend server systems, which may be cloud-based, network-based, or resident at the particular campus or geographical location serviced by the remote transportation system 52. The remote transportation system 52 can be manned by a live advisor, or an automated advisor, or a combination of both. The remote transportation system 52 can communicate with the user devices 54 and the autonomous vehicles 10a-10n to schedule rides, dispatch autonomous vehicles 10a-10n, and the like. In various embodiments, the remote transportation system 52 stores account information such as subscriber authentication information, vehicle identifiers, profile records, behavioral patterns, and other pertinent subscriber information.” And in paragraph [0068]: “In accordance with various embodiments, the controller 34 implements a high-level controller of an autonomous driving system (ADS) 33 as shown in FIG. 3. That is, suitable software and/or hardware components of the controller 34 (e.g., the processor 44 and the computer-readable storage device 46) are utilized to provide a high-level controller of an autonomous driving system 33 that is used in conjunction with vehicle 10. The high-level controller of the autonomous driving system 33 will be described in greater detail below with reference to FIGS. 4 and 5” – The controller for autonomous driving that is connected to a communications network and remote transportation system including a network based system is the a processor coupled to the network controller.)
“and a memory coupled to the processor, wherein the memory includes a set of instructions, which when executed by the processor, cause the computing system to: generate, by a full inference path of a neural network, a first detection result associated with one or more objects in the first video frame,” (Paragraph [0068]: “In accordance with various embodiments, the controller 34 implements a high-level controller of an autonomous driving system (ADS) 33 as shown in FIG. 3. That is, suitable software and/or hardware components of the controller 34 (e.g., the processor 44 and the computer-readable storage device 46) are utilized to provide a high-level controller of an autonomous driving system 33 that is used in conjunction with vehicle 10. The high-level controller of the autonomous driving system 33 will be described in greater detail below with reference to FIGS. 4 and 5” and in paragraph [0122]: “The fast-convolutional neural network (R-CNN) 246 is a state-of-the-art visual object detection system that combines bottom-up region bounding box proposals with rich features computed by a convolutional neural network. The fast-convolutional neural network (R-CNN) 246 processes the image data from the feature map for the regions of interest to detect and localize objects, and classify the detected objects within the perception map 141. Objects that are detected can be classified according to semantic classes, for example, pedestrians, vehicles, etc.” paragraph [0116]: “The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle. The feature extraction CNN 130 concatenates each feature layer 232 with a previous feature layer 236 and a range presence map 238 to generate and output the feature map 132. The feature map 132 is the concatenated layers from feature layer 232, the previous feature layer 236, and the range presence map 238. In other words, the concatenation of range presence map 238, the current vision-based feature map 232 and a previous vision-based feature map 236 from a previous time instant form the whole feature map 132” – The detection of objects and the pervious vision based feature map of Zeng is the full inference path.)
“detect the second video frame,” (Paragraph [0114]: “The feature extraction CNN 130 receives the sensor data 129 as an input layer 222. The sensor data 129 can include image data 212 and range point data 214. The image data 212 can include an image that includes pixel information or data (e.g., pixels) obtained via cameras.” In paragraph [0116]: “The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle. The feature extraction CNN 130 concatenates each feature layer 232 with a previous feature layer 236 and a range presence map 238 to generate and output the feature map 132. The feature map 132 is the concatenated layers from feature layer 232, the previous feature layer 236, and the range presence map 238. In other words, the concatenation of range presence map 238, the current vision-based feature map 232 and a previous vision-based feature map 236 from a previous time instant form the whole feature map 132.” And in paragraph [0117]: “The perception map generator module 134 generates the perception map 141 based on the feature map 132. The perception map is a human-readable representation of the driving environment that includes scenes being acquired via the sensor system 128 at any given instant. As will be described below, the perception map 141 includes multiple elements including: object (bounding boxes) locations, orientations, velocities (represented by 141-A); a freespace grid or image segmentation of freespace (represented by 141-B); road feature locations/types (represented by 141-C); and stixels (represented by 141-D).” – The image data via cameras at different time instances of Zeng is the second video frame.)
“and generate, by a partial inference path of the neural network, a second detection result based on the first detection result, wherein the second detection result corresponds to the second video frame.” (Paragraph [0022]: “In one embodiment, the perception map generator module comprises an object detection CNN comprising a region proposal (RP) generator module configured to process the feature map to generate a set of bounding box region proposals; a region of interest (ROI) pooling module configured to process the feature map and the set of bounding box region proposals to extract regions of interest from the feature map that are bounding box candidates; a fast-convolutional neural network (R-CNN) configured to process the bounding box candidates to generate bounding box location, orientation, and velocity of each detected object of the perception map; and classify the detected objects according to semantic classes in accordance with their respective object types;” In paragraph [0116]: “The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle. The feature extraction CNN 130 concatenates each feature layer 232 with a previous feature layer 236 and a range presence map 238 to generate and output the feature map 132. The feature map 132 is the concatenated layers from feature layer 232, the previous feature layer 236, and the range presence map 238. In other words, the concatenation of range presence map 238, the current vision-based feature map 232 and a previous vision-based feature map 236 from a previous time instant form the whole feature map 132.” And in paragraph [0123]: “In one embodiment, the fast-convolutional neural network (R-CNN) 246 is a multi-layer CNN design that monitors the extracted 7×7 grid feature map computed by ROI pooling module 242 for each region proposal (RP), and outputs the 3D bounding box attribute (i.e., center position, width, height, and length), the object velocity, and object classification probabilities (i.e., the likelihood that the bounding box enclosed a vehicle, pedestrian, motorcycle, and etc.). The box velocity can be estimated through regression using neural network by monitoring the input from feature layer 232 and the previous feature layer 236.” – The time instances to detect objects of Zeng is the second frame. The probability boundary boxes of Zeng is the partial inference because it is a probability and thereby not a full inference.)
“At least one computer readable storage medium comprising a set of instructions, which when executed by a computing system, cause the computing system to:” of claim 14 (Paragraph [0142]: “FIGS. 1-9B. In certain embodiments, some or all steps of these methods, and/or substantially equivalent steps, are performed by execution of processor-readable instructions stored or included on a processor-readable medium.” – The processor-readable instructions on a processor-readable medium of Zeng is the computer readable storage medium comprising a set of instructions.)

Regarding claims 2, 8, 15, and 21, taking claim 2 as exemplary:
Zeng shows the system, apparatus, computer readable storage medium, and method of claims 1, 7, 14, and 20 as claimed and specified above.
And Zeng shows:
“The computing system of claim 1, wherein the instructions, when executed, cause the computing system to: conduct an early feature generation based on the second video frame;” (Paragraph [0116]: “The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle. The feature extraction CNN 130 concatenates each feature layer 232 with a previous feature layer 236 and a range presence map 238 to generate and output the feature map 132. The feature map 132 is the concatenated layers from feature layer 232, the previous feature layer 236, and the range presence map 238. In other words, the concatenation of range presence map 238, the current vision-based feature map 232 and a previous vision-based feature map 236 from a previous time instant form the whole feature map 132.” In paragraph [0117]: “The perception map generator module 134 generates the perception map 141 based on the feature map 132. The perception map is a human-readable representation of the driving environment that includes scenes being acquired via the sensor system 128 at any given instant. As will be described below, the perception map 141 includes multiple elements including: object (bounding boxes) locations, orientations, velocities (represented by 141-A); a freespace grid or image segmentation of freespace (represented by 141-B); road feature locations/types (represented by 141-C); and stixels (represented by 141-D).” And in paragraph [0122]: “The fast-convolutional neural network (R-CNN) 246 is a state-of-the-art visual object detection system that combines bottom-up region bounding box proposals with rich features computed by a convolutional neural network. The fast-convolutional neural network (R-CNN) 246 processes the image data from the feature map for the regions of interest to detect and localize objects, and classify the detected objects within the perception map 141. Objects that are detected can be classified according to semantic classes, for example, pedestrians, vehicles, etc.” – The different layers and different time instances of Zeng are the early  feature generations. The setting up of regions of interests for detection of Zeng is also the an early feature generation because it is setting up a predetermined location of interest in a perception map to detect and classify objects. The use of time instances of Zeng shows that this is done for second frames as well.)
“and conduct a region of interest pooling based on an output of the early feature generation, wherein the second detection result is generated based on the region of interest pooling.” (Paragraph [0116]: “The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle. The feature extraction CNN 130 concatenates each feature layer 232 with a previous feature layer 236 and a range presence map 238 to generate and output the feature map 132. The feature map 132 is the concatenated layers from feature layer 232, the previous feature layer 236, and the range presence map 238. In other words, the concatenation of range presence map 238, the current vision-based feature map 232 and a previous vision-based feature map 236 from a previous time instant form the whole feature map 132.” In paragraph [0117]: “The perception map generator module 134 generates the perception map 141 based on the feature map 132. The perception map is a human-readable representation of the driving environment that includes scenes being acquired via the sensor system 128 at any given instant. As will be described below, the perception map 141 includes multiple elements including: object (bounding boxes) locations, orientations, velocities (represented by 141-A); a freespace grid or image segmentation of freespace (represented by 141-B); road feature locations/types (represented by 141-C); and stixels (represented by 141-D).” And in paragraph [0123]: “In one embodiment, the fast-convolutional neural network (R-CNN) 246 is a multi-layer CNN design that monitors the extracted 7×7 grid feature map computed by ROI pooling module 242 for each region proposal (RP), and outputs the 3D bounding box attribute (i.e., center position, width, height, and length), the object velocity, and object classification probabilities (i.e., the likelihood that the bounding box enclosed a vehicle, pedestrian, motorcycle, and etc.).” – The object classification probabilities of Zeng are the plurality of object proposals. The ROI pooling of Zeng is the region of interest pooling. The use of time instances of Zeng shows that this is done for second frames as well. )

Regarding claims 3, 9, 16, and 22, taking claim 3 as exemplary:
Zeng shows the system, apparatus, computer readable storage medium, and method of claims 1, 7, 14, and 20 as claimed and specified above.
And Zeng shows “wherein the second detection result is to include one or more objectness bounding boxes.” (Paragraph [0022]: “In one embodiment, the perception map generator module comprises an object detection CNN comprising a region proposal (RP) generator module configured to process the feature map to generate a set of bounding box region proposals; a region of interest (ROI) pooling module configured to process the feature map and the set of bounding box region proposals to extract regions of interest from the feature map that are bounding box candidates; a fast-convolutional neural network (R-CNN) configured to process the bounding box candidates to generate bounding box location, orientation, and velocity of each detected object of the perception map; and classify the detected objects according to semantic classes in accordance with their respective object types;” in paragraph [0116]: “The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle. The feature extraction CNN 130 concatenates each feature layer 232 with a previous feature layer 236 and a range presence map 238 to generate and output the feature map 132. The feature map 132 is the concatenated layers from feature layer 232, the previous feature layer 236, and the range presence map 238. In other words, the concatenation of range presence map 238, the current vision-based feature map 232 and a previous vision-based feature map 236 from a previous time instant form the whole feature map 132.” And in paragraph [0123]: “In one embodiment, the fast-convolutional neural network (R-CNN) 246 is a multi-layer CNN design that monitors the extracted 7×7 grid feature map computed by ROI pooling module 242 for each region proposal (RP), and outputs the 3D bounding box attribute (i.e., center position, width, height, and length), the object velocity, and object classification probabilities (i.e., the likelihood that the bounding box enclosed a vehicle, pedestrian, motorcycle, and etc.). The box velocity can be estimated through regression using neural network by monitoring the input from feature layer 232 and the previous feature layer 236.” – The use of object bound boxes and the determination of candidate results from the bounding boxes at time instances of Zeng is the wherein the second detection result is to include one or more objectness bounding boxes)

Regarding claims 4, 10, 17, and 23, taking claim 4 as exemplary:
Zeng shows the system, apparatus, computer readable storage medium, and method of claims 1, 7, 14, and 20 as claimed and specified above.
And Zeng shows “wherein the instructions, when executed, cause the computing system to repeat generation of the second detection result for a tunable plurality of video frames that are subsequent to the first video frame.” (Paragraph [0116]: “The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle. The feature extraction CNN 130 concatenates each feature layer 232 with a previous feature layer 236 and a range presence map 238 to generate and output the feature map 132. The feature map 132 is the concatenated layers from feature layer 232, the previous feature layer 236, and the range presence map 238. In other words, the concatenation of range presence map 238, the current vision-based feature map 232 and a previous vision-based feature map 236 from a previous time instant form the whole feature map 132.” In paragraph [0117]: “The perception map generator module 134 generates the perception map 141 based on the feature map 132. The perception map is a human-readable representation of the driving environment that includes scenes being acquired via the sensor system 128 at any given instant. As will be described below, the perception map 141 includes multiple elements including: object (bounding boxes) locations, orientations, velocities (represented by 141-A); a freespace grid or image segmentation of freespace (represented by 141-B); road feature locations/types (represented by 141-C); and stixels (represented by 141-D).” And in paragraph [0122]: “The fast-convolutional neural network (R-CNN) 246 is a state-of-the-art visual object detection system that combines bottom-up region bounding box proposals with rich features computed by a convolutional neural network. The fast-convolutional neural network (R-CNN) 246 processes the image data from the feature map for the regions of interest to detect and localize objects, and classify the detected objects within the perception map 141. Objects that are detected can be classified according to semantic classes, for example, pedestrians, vehicles, etc.” – The classification of objections at time instances of Zeng is the repeat generation of the second detection result for a tunable plurality of video frames that are subsequent to the first video frame.)

Regarding claims 5, 11, 18, and 24, taking claim 5 as exemplary:
Zeng shows the system, apparatus, computer readable storage medium, and method of claims 1, 7, 14, and 20 as claimed and specified above.
And Zeng shows:
“wherein the instructions, when executed, cause the computing system to: conduct an early feature generation based on the first video frame; conduct a later feature generation based on an output of the early feature generation;” (Paragraph [0116]: “The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle. The feature extraction CNN 130 concatenates each feature layer 232 with a previous feature layer 236 and a range presence map 238 to generate and output the feature map 132. The feature map 132 is the concatenated layers from feature layer 232, the previous feature layer 236, and the range presence map 238. In other words, the concatenation of range presence map 238, the current vision-based feature map 232 and a previous vision-based feature map 236 from a previous time instant form the whole feature map 132.” In paragraph [0117]: “The perception map generator module 134 generates the perception map 141 based on the feature map 132. The perception map is a human-readable representation of the driving environment that includes scenes being acquired via the sensor system 128 at any given instant. As will be described below, the perception map 141 includes multiple elements including: object (bounding boxes) locations, orientations, velocities (represented by 141-A); a freespace grid or image segmentation of freespace (represented by 141-B); road feature locations/types (represented by 141-C); and stixels (represented by 141-D).” And in paragraph [0122]: “The fast-convolutional neural network (R-CNN) 246 is a state-of-the-art visual object detection system that combines bottom-up region bounding box proposals with rich features computed by a convolutional neural network. The fast-convolutional neural network (R-CNN) 246 processes the image data from the feature map for the regions of interest to detect and localize objects, and classify the detected objects within the perception map 141. Objects that are detected can be classified according to semantic classes, for example, pedestrians, vehicles, etc.” – The different layers and different time instances of Zeng are the early and later feature generations. The setting up of regions of interests for detection of Zeng is also the an early feature generation because it is setting up a predetermined location of interest in a perception map to detect and classify objects.)
“generate a plurality of object proposals based on an output of the later feature generation; and conduct a region of interest pooling based on the output of the later feature generation and the plurality of object proposals,” (Paragraph [0123]: “In one embodiment, the fast-convolutional neural network (R-CNN) 246 is a multi-layer CNN design that monitors the extracted 7×7 grid feature map computed by ROI pooling module 242 for each region proposal (RP), and outputs the 3D bounding box attribute (i.e., center position, width, height, and length), the object velocity, and object classification probabilities (i.e., the likelihood that the bounding box enclosed a vehicle, pedestrian, motorcycle, and etc.).” – The object classification probabilities of Zeng are the plurality of object proposals. The ROI pooling of Zeng is the region of interest pooling.)
“wherein the first detection result is generated based on the region of interest pooling, and wherein the partial inference path bypasses the later feature generation and generation of the plurality of object proposals.” (Paragraph [0123]: “In one embodiment, the fast-convolutional neural network (R-CNN) 246 is a multi-layer CNN design that monitors the extracted 7×7 grid feature map computed by ROI pooling module 242 for each region proposal (RP), and outputs the 3D bounding box attribute (i.e., center position, width, height, and length), the object velocity, and object classification probabilities (i.e., the likelihood that the bounding box enclosed a vehicle, pedestrian, motorcycle, and etc.).” And in Paragraph [0116]: “The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle. The feature extraction CNN 130 concatenates each feature layer 232 with a previous feature layer 236 and a range presence map 238 to generate and output the feature map 132. The feature map 132 is the concatenated layers from feature layer 232, the previous feature layer 236, and the range presence map 238. In other words, the concatenation of range presence map 238, the current vision-based feature map 232 and a previous vision-based feature map 236 from a previous time instant form the whole feature map 132.”  – The object classification probability of Zeng is the bypassing of later feature generation and object proposals because it simply presents an object classification for that specific feature map.)

Regarding claims 6, 12, 19, and 25, taking claim 6 as exemplary:
Zeng shows the system, apparatus, computer readable storage medium, and method of claims 1, 7, 14, and 20 as claimed and specified above.
And Zeng shows “wherein the first detection result is to include one or more object class bounding boxes.” (Paragraph [0022]: “In one embodiment, the perception map generator module comprises an object detection CNN comprising a region proposal (RP) generator module configured to process the feature map to generate a set of bounding box region proposals; a region of interest (ROI) pooling module configured to process the feature map and the set of bounding box region proposals to extract regions of interest from the feature map that are bounding box candidates; a fast-convolutional neural network (R-CNN) configured to process the bounding box candidates to generate bounding box location, orientation, and velocity of each detected object of the perception map; and classify the detected objects according to semantic classes in accordance with their respective object types;” in paragraph [0116]: “The feature extraction CNN 130 processes the range point data 214 to generate a range presence map 238 of range point data. Each range point indicates a value of a distance from a vehicle. The feature extraction CNN 130 concatenates each feature layer 232 with a previous feature layer 236 and a range presence map 238 to generate and output the feature map 132. The feature map 132 is the concatenated layers from feature layer 232, the previous feature layer 236, and the range presence map 238. In other words, the concatenation of range presence map 238, the current vision-based feature map 232 and a previous vision-based feature map 236 from a previous time instant form the whole feature map 132.” – The bounding box locations for current and previous time instants of a feature map of detected objects of Zeng is the first detection result is to include one or more object class bounding boxes.)

Regarding claim 13:
Zeng shows the apparatus of claim 7 as claimed and specified above.
And Zeng shows “wherein the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates.” (Paragraph [0057]: “The controller 34 includes at least one processor 44 and a computer readable storage device or media 46. The processor 44 can be any custom made or commercially available processor, a central processing unit (CPU), a graphics processing unit (GPU), an auxiliary processor among several processors associated with the controller 34, a semiconductor based microprocessor (in the form of a microchip or chip set), a macroprocessor, any combination thereof, or generally any device for executing instructions” – The chip of Zeng is the logic coupled to substrate that includes transistor channels.)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Wheeler et al., (US 2018/0188045 A1), part of the prior art made of record, teaches the use of a full inference and a partial second inference detection result of claims 1, 7, 14, and 20 in paragraphs [0114]-[0120] through the use of detection and classification of objections of a vehicle through continuous receiving of sensor data to update environmental surrounding 3d representation (full inference) and to also include human verification of certain objects detected to verify discrepancies (partial inference).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHANE D WOOLWINE whose telephone number is (571)272-4138. The examiner can normally be reached M-F 9:30-6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MIRANDA HUANG can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHANE D. WOOLWINE
Primary Examiner
Art Unit 2124



/SHANE D WOOLWINE/Primary Examiner, Art Unit 2124