Emerging self-supervised learning (SSL) has become a popular image
representation encoding method to obviate the reliance on labeled data and
learn rich representations from large-scale, ubiquitous unlabelled data. Then
one can train a downstream classifier on top of the pre-trained SSL image
encoder with few or no labeled downstream data. Although extensive works show
that SSL has achieved remarkable and competitive performance on different
downstream tasks, its security concerns, e.g, Trojan attacks in SSL encoders,
are still not well-studied. In this work, we present a novel Trojan Attack
method, denoted by ESTAS, that can enable an effective and stable attack in SSL
encoders with only one target unlabeled sample. In particular, we propose
consistent trigger poisoning and cascade optimization in ESTAS to improve
attack efficacy and model accuracy, and eliminate the expensive target-class
data sample extraction from large-scale disordered unlabelled data. Our
substantial experiments on multiple datasets show that ESTAS stably achieves >
99% attacks success rate (ASR) with one target-class sample. Compared to prior
works, ESTAS attains > 30% ASR increase and > 8.3% accuracy improvement on
average.

By admin