Comparative Benchmarking of Single-View and Multi-View Models for Partially Obscured Object Detection with C5ISR

Conventional object detection models like YOLO often struggle with occluded objects due to reliance on single viewpoints and limited scene context. In our capstone project, we benchmarked BEVFormer, a transformer-based model which using bird’s-eye view representation, against YOLO on the NuScenes dataset. The NuScenes dataset contains diverse urban scenes with detailed annotations like object visibility which we can leverage to assess model performance on occluded objects. To evaluate how BEVFormer improves the detection of occluded objects, we conduct an experiment in which we repeatedly sample 100 observations from each visibility level, as labeled by NuScenes for each object. For each sample, we compute the total number of detected objects using both YOLO and BEVFormer models. This process is repeated 1,000 times to approximate the sampling distribution of the detected object count. The resulting distributions are then visualized to compare the detection performance of the two models across different visibility levels.

The results revealed that BEVFormer significantly outperformed YOLO in detecting occluded objects, particularly when visibility was below 40%. YOLO’s accuracy dropped to 7.7% in these cases, while BEVFormer maintained over 30% accuracy. This performance can be attributed to BEVFormer’s ability to effectively integrate multi-view data, enabling better occlusion handling. We examined saliency maps and conducted feature contribution analysis to understand BEVFormer’s decision-making process and identify key factors behind its robust performance in challenging occlusion scenarios. The findings from this project offer our stakeholder practical insights to optimize their high-stakes object detection systems, leading to better performance in real-world scenarios without requiring significant additional data collection.

Mentor: Nick Eubank