With recent advances in autonomous driving, Voice Control Systems have become
increasingly adopted as human-vehicle interaction methods. This technology
enables drivers to use voice commands to control the vehicle and will be soon
available in Advanced Driver Assistance Systems (ADAS). Prior work has shown
that Siri, Alexa and Cortana, are highly vulnerable to inaudible command
attacks. This could be extended to ADAS in real-world applications and such
inaudible command threat is difficult to detect due to microphone
nonlinearities. In this paper, we aim to develop a more practical solution by
using camera views to defend against inaudible command attacks where ADAS are
capable of detecting their environment via multi-sensors. To this end, we
propose a novel multimodal deep learning classification system to defend
against inaudible command attacks. Our experimental results confirm the
feasibility of the proposed defense methods and the best classification
accuracy reaches 89.2%. Code is available at

