祝贺我们的论文被IEEE TIFS接收！-密码学与区块链技术实验室

Authors: Qi Feng, Debiao He, Zhe Liu, Huaqun Wang, and Kim-Kwang Raymond Choo
Title: SecureNLP: A system for multi-party privacy-preserving natural language processing
Journal: IEEE Transactions on Information Forensics and Security

Abstract: Natural language processing (NLP) allows a computer program to understand human language as it is spoken, and has been increasingly deployed in a growing number of applications, such as machine translation, sentiment analysis, and electronic voice assistant. While information obtained from different sources can enhance the accuracy of NLP models, there are also privacy implications in the collection of such massive data. Thus, in this paper, we design a privacy-preserving system SecureNLP, focusing on the instance of recurrent neural network (RNN)-based sequence-to-sequence with attention model for neural machine translation. Specifically, for non-linear functions such as sigmoid and tanh, we design two efficient multi-party protocols using secure multi-party computation (MPC), which are used to carry out the respective tasks in the SecureNLP. We also prove the security of these two protocols (i.e., privacy-preserving long short-term memory network \textsf{PrivLSTM}, and privacy-preserving sequence to sequence transformation \textsf{PrivSEQ2SEQ}) in the semi-honest adversary model, in the sense that any honest-but-curious adversary cannot learn anything else from the messages they receive from other parties. The proposed system is implemented in C++ and Python, and the findings from the evaluation demonstrate the utility of the protocols in cross-domain NLP.

摘要：自然语言处理（Natural Language Processing, NLP）研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法，目前已得到广泛应用，例如机器翻译、语义分析和智能语音助手等。自然语言处理模型的准确度依赖于大量数据的汇集，但是由此也引发了关于数据隐私问题的争议。因此，本文选择基于卷积神经网络的Seq2Seq（全称Sequence to Sequence，序列对序列）深度学习模型，设计了一个隐私安全的序列转化系统SecureNLP。具体来讲，本文采用安全多方计算的思路将非线性激活函数sigmoid和tanh部署于分布式的计算框架，在此基础上安全执行分布式序列转化协议：即隐私保护LSTM模型PrivLSTM和隐私保护序列转化模型PrivSEQ2SEQ。本文在半诚实模型下证明了所设计协议的安全性，即假设敌手会诚实执行协议但无法分析到任何有效信息。最后，基于C++和Python的实验结果表明本文所设计协议具有较好的性能，非常适用于跨平台自然语言处理任务。