Towards efficient self-supervised representation learning in speech processing

Speech processing models are computationally expensive, generating environmental concerns because of their high energy consumption. ESSL (Efficient Self-Supervised Learning) addresses this issue, enabling pretraining with a single GPU for only 28 hours. The reduction in computational costs represents up to two orders of magnitude improvement against existing speech models. Its source code is available on GitHub under an MIT license.