Visible to the public Hardware Design of Polynomial Multiplication for Byte-Level Ring-LWE Based Cryptosystem

TitleHardware Design of Polynomial Multiplication for Byte-Level Ring-LWE Based Cryptosystem
Publication TypeConference Paper
Year of Publication2020
AuthorsKhuchit, U., Wu, L., Zhang, X., Yin, Y., Batsukh, A., Mongolyn, B., Chinbat, M.
Conference Name2020 IEEE 14th International Conference on Anti-counterfeiting, Security, and Identification (ASID)
Date Publishedoct
KeywordsBRAMs, byte-level modulus, byte-level ring-LWE based cryptosystem, compiler security, compositionality, computational time-consuming block, cryptography, DSPs, field programmable gate arrays, Hardware, high level synthesis, high-level synthesis based hardware design methodology, ideal lattice, LAC, lattice-based cryptography, learning (artificial intelligence), logic design, Metrics, multiplication core, NIST, NIST PQC Standardization Process, polynomial multiplication, polynomials, post quantum cryptography, program compilers, pubcrawl, Resiliency, ring learning with error problem, ring LWE, Scalability, Software algorithms, Table lookup, time 4.3985 ns, time 5.052 ns, time 5.133 ns, Timing, Vivado HLS compiler, Xilinx Artix-7 family FPGA
AbstractAn ideal lattice is defined over a ring learning with errors (Ring-LWE) problem. Polynomial multiplication over the ring is the most computational and time-consuming block in lattice-based cryptography. This paper presents the first hardware design of the polynomial multiplication for LAC, one of the Round-2 candidates of the NIST PQC Standardization Process, which has byte-level modulus p=251. The proposed architecture supports polynomial multiplications for different degree n (n=512/1024/2048). For designing the scheme, we used the Vivado HLS compiler, a high-level synthesis based hardware design methodology, which is able to optimize software algorithms into actual hardware products. The design of the scheme takes 274/280/291 FFs and 204/217/208 LUTs on the Xilinx Artix-7 family FPGA, requested by NIST PQC competition for hardware implementation. Multiplication core uses only 1/1/2 pieces of 18Kb BRAMs, 1/1/1 DSPs, and 90/94/95 slices on the board. Our timing result achieved in an alternative degree n with 5.052/4.3985/5.133ns.
Citation Keykhuchit_hardware_2020