Doing More with Less- Cost-Effective Infrastructure for Automotive Vision Capabilities
Many safety-critical cyber-physical systems rely on advanced sensing capabilities to react to changing
environmental conditions. However, cost-effective deployments of such capabilities have remained
elusive. Such deployments will require software infrastructure that enables multiple sensor-processing
streams to be multiplexed onto a common hardware platform at reasonable cost, as well as tools and
methods for validating that required processing rates can be maintained.
The choice of hardware platform to utilize in autonomous vehicles is not straightforward. One choice
that is receiving considerable attention today is the usage of energy-efficient multicore platforms equipped
with graphics processing units (GPUs) that can speed up mathematical computations inherent to signal
processing, image processing, motion planning, etc. Perhaps the most prominent such platform today is
NVIDIA’s Jetson TX1. However, no published study exists that expressly evaluates the effectiveness of
the TX1, or any other comparable energy-efficient embedded GPU platform, in hosting safety-critical realtime
workloads. In this work, we present an evaluation of the TX1 that seeks to determine its suitability
for hosting safety-critical workloads arising in autonomous-driving use cases.
We first present results obtained by running benchmarks that seek to resolve basic GPU management
options. One such option is zero-copy memory, a feature inherent in integrated GPUs. Our experiments
not only confirmed that utilizing zero-copy memory is unlikely to provide a performance benefit, but also
that changing CUDA versions can have a major impact on measurement-based timing analysis. Another
option that we consider is whether GPU co-scheduling should be allowed, and found that it appears to
be a viable option for improving utilization as long as it is carefully evaluated for each individual use
case. Finally, to see if the TX1 is capable of predictably sustaining an autonomous-vehicle workload,
we conducted case-study experiments running workloads performing computer-vision tasks. Specifically,
we evaluated CaffeNet, an image classification task, and a separate road-sign recognition task. We found
that both CaffeNet and sign recognition can run at approximately 30 frames per second on the TX1 in
isolation. When applying our findings related to co-scheduling, we can conclude that these tasks should
be able to continue running predictably at approximately 24 frames per second while co-scheduled.
We also performed an in-depth case study of an object detection system. Given an image, the detection
system produces bounding boxes that cover generic “objects” of interest. This is an important task for
camera-based advanced driver assistance systems (ADAS), because it encompasses pedestrian, car, and
sign detection.
Current state-of-the-art object detection systems are variants of the following approach: (i) propose
bounding boxes, (ii) resample pixels or features for each bounding box, and (iii) apply a high-quality
classifier. However, processing a large number of bounding boxes can be so computationally expensive
that even powerful GPUs may be unable to achieve predictable real-time performance. We present a
different approach using a single deep convolutional neural network. It skips proposing bounding boxes,
and instead directly generates the boxes using the neural network. It turns out that this proposed approach
not only achieves better accuracy than existing methods, but also faster processing times. We break down
its entire computation into a computational graph and suggest approaches to make it satisfy real-time
constraints.