Authors: Runqing Xu, Debiao He, Cong Peng, Min Luo, Xiangyong Zeng
Title: Optimizing Dilithium Implementation with AVX2/-512
Journal: ACM Transactions on Embedded Computing Systems
Abstract: Dilithium is a signature scheme that is currently being standardized to the Module-Lattice-Based Digital Signature Standard by NIST. It is believed to be secure even against attacks from large-scale quantum computers based on lattice problems. The implementation efficiency is important for promoting the migration of current cryptography algorithms to post-quantum cryptography algorithms. In this paper, we optimize the implementation of Dilithium with several new approaches proposed. Firstly, we improve the efficiency of parallel NTT implementations. The overhead of shuffling operations is reduced in our implementations, and fewer loading instructions are invoked for the precomputations. Then, we optimize the sampling and bit-packing of polynomial coefficients in Dilithium. We can handle double the number of coefficients within one register using a new approach for the sampling of secret key polynomials. The approaches proposed in this paper are applicable to implementations under AVX2 and AVX-512 instruction sets. Take Dilithium2 as an illustration, our AVX2 implementation demonstrates improvements of 22.7%, 16.9%, and 13.5% for KeyGen, Sign, and Verify compared to the previous implementation.
地址:湖北省武汉市武昌区珞珈山,武汉大学国家网络安全学院
Email:cpeng@whu.edu.cn (彭聪)