AES / CFB8的高效解密(Efficient decryption of AES/CFB8)

我目前使用此函数来解密在CFB8模式下使用AES加密的数据流: https : //github.com/Lazersmoke/civskell/blob/ebf4d761362ee42935faeeac0fe447abe96db0b5/src/Civskell/Tech/Encrypt.hs#L167-L175

cfb8Decrypt :: AES128 -> BS.ByteString -> BS.ByteString -> (BS.ByteString,BS.ByteString) cfb8Decrypt c i = BS.foldl magic (BS.empty,i) where magic (ds,iv) d = (ds `BS.snoc` pt,ivFinal) where pt = BS.head (ecbEncrypt c iv) `xor` d -- snoc on cipher always ivFinal = BS.tail iv `BS.snoc` d

如果你不理解Haskell,这里有一个关于我如何相信这段代码的快速概述:(我没有写它)

给定一个IV和一个加密字节的列表 对于每个加密字节: 在ECB模式下加密IV。 取加密后的IV的第一个字节,并将其与加密的字节进行异或。 这是下一个明文字节。 从IV中删除第一个字节 将加密的字节附加到IV 下一个角色将使用这个新的IV进行解密

并非ECB模式加密由cryptonite库处理。 我找不到支持CFB8的图书馆。

现在,这工作。 但是,由于我需要解密的数据量,它会限制我的一个CPU内核,80%的时间只用于解密。

传入的数据甚至没有那么多,所以这是不可接受的。 不幸的是,我对密码学的知识相当有限,CFB8上的资源似乎相当稀少。 看起来,CFB8是一种不常见的操作模式,也表明缺乏图书馆支持。

那么,我的问题是:我将如何去优化这个?

传入的数据来自TCP流,但信息被分组为数据包。 根据大小,cfb8Decrypt函数被称为每个数据包2-5次。 这是必要的,因为数据包的长度在开始时被传输,但是这个大小信息的长度是可变的。 在使用1-4个解密来解密长度之后,整个数据包将被立即解密。 我曾试图减少这一点,但我不确定它是否会对速度产生任何影响。

编辑:分析结果: http ://svgur.com/i/40b.svg

I currently use this function to decrypt a data stream encrypted with AES in CFB8 mode: https://github.com/Lazersmoke/civskell/blob/ebf4d761362ee42935faeeac0fe447abe96db0b5/src/Civskell/Tech/Encrypt.hs#L167-L175

cfb8Decrypt :: AES128 -> BS.ByteString -> BS.ByteString -> (BS.ByteString,BS.ByteString) cfb8Decrypt c i = BS.foldl magic (BS.empty,i) where magic (ds,iv) d = (ds `BS.snoc` pt,ivFinal) where pt = BS.head (ecbEncrypt c iv) `xor` d -- snoc on cipher always ivFinal = BS.tail iv `BS.snoc` d

In case you don't understand Haskell, here's a quick rundown of how I believe this code works: (I did not write it)

Given an IV and a list of encrypted bytes For every encrypted byte: Encrypt the IV in ECB-mode. Take the first byte of the encrypted IV and xor it with the encrypted byte. This is the next plaintext byte. Remove the first byte from the IV Append the encrypted byte to the IV The next character will be decrypted using this new IV

Not that the ECB-mode encryption is handled by the cryptonite library. I could not find a library supporting CFB8.

Now, this works. However, with the amount of data I need to decrypt, it caps out one of my CPU cores and 80% of the time is just spent on decrypting.

The incoming data is not even that much, so this is not acceptable. Unfortunately, my knowledge of cryptography is rather limited and resources on CFB8 seem rather sparse. It appears that CFB8 is an uncommon mode of operation, also indicated by the lack of library support.

So, my question then is: How would I go about optimising this?

The incoming data is from a TCP stream, but the information is grouped into packets. The cfb8Decrypt function is called 2-5 times per packet, depending on the size. This is necessary, because the length of the packet is transmitted at the beginning, but the length of this size information is variable. After 1-4 decryptions are used to decrypt the length, the entire packet will be decrypted at once. I thought about trying to reduce this, but I am unsure if it would have any effect on speed at all.

Edit: Profiling results: http://svgur.com/i/40b.svg

最满意答案

创建CFB8以在噪声信道上具有良好的错误传播属性。 众所周知,它并不快; 它实际上是16倍慢,因为它需要对每个字节进行块加密。 目前它不是很热,因为我们倾向于将CRC用于数据层,而MAC则用于加密级别上的完整性以防止故意攻击。

你怎么能加快速度? 你唯一能做的就是使用一个快速库。 您目前使用的库似乎支持AES-NI,因此请确保已在CPU和BIOS上启用。

然而,如果你不得不称之为阻塞,它很可能不会加速很多。 你真的想要使用一个本地调用来接收整个数据包并将其解密。 AES-NI在Atom实现TLS时速度最慢仍然达到20 MiB / s,但在服务器芯片上,AES-NI通常远远超出1 GiB / s的限制。 当AES-NI不可用时,装配或优化的C应该是慢6/7倍。

像Haskell这样的函数式编程语言并不是真正为快速I / O和快速位操作而创建的。 所以你可以打赌它会比Java或C#慢得多,而且这些已经比本机代码慢得多,更不用说汇编代码或专用指令了。

如今的记忆力相当快; 但是CPU速度要快得多。 因此,应该避免避免虚假内存分配和复制(同样,对于完全功能的语言来说,这并不容易,在本机代码中尽可能多地做到这一点)。 但是,请确保没有缓冲区溢出问题,否则在不安全的应用程序中您将拥有快速的AES / CFB。

CFB8 was created to have good error propagation properties over a noisy channel. It is well known that it is not fast; it is actually 16 times as slow, as it requires a block encrypt for each byte. Currently it is not very hot, as we tend to use CRC's for the data layer and MAC for integrity on cryptographic levels against willful attacks.

How can you speed it up? The only thing you can really do is to use a fast library. The library you are currently using seems to have support for AES-NI, so make sure that is enabled on your CPU and BIOS.

However, it is very likely that it won't speed up much if you have to call it block for block. You really want to use a native call that takes the whole packet and decrypts it. AES-NI in it's slowest on an Atom implementing TLS still goes to 20 MiB/s, but on server chips AES-NI often goes far beyond the 1 GiB/s limits. Assembly or optimized C should be about 6/7 times as slow when AES-NI is not available.

Functional programming languages like Haskell are not really created for fast I/O nor fast bit-operations. So you can bet that it will be much, much slower than e.g. Java or C#, and those are already much slower than native code let alone assembly code or specialized instructions.

Memory nowadays is pretty fast; CPU's are however much much faster. So avoiding spurious memory allocations and copying should be avoided (again, not that easy to do on a fully functional language, all the more reason to do as much as possible in native code). Do however make sure that there are no buffer overflow issues or you will have fast AES/CFB within an insecure application.

更多推荐