Atbash Cipher Code Refactoring Record
Introduction
This comes from the main exercise of exercism in rust.
I asked some friends from a small Rust community and I got these elegant codes. These codes are so beautiful and tricky. So I decide to write it down and hope it may be useful for you as well.
You may learn how to “chunk” vector in rust as a string without using a loop while joining characters.
Exercise Introduction:
Instructions
Create an implementation of the atbash cipher, an ancient encryption system created in the Middle East.
The Atbash cipher is a simple substitution cipher that relies on transposing all the letters in the alphabet such that the resulting alphabet is backwards. The first letter is replaced with the last letter, the second with the second-last, and so on.
An Atbash cipher for the Latin alphabet would be as follows:
It is a very weak cipher because it only has one possible key, and it is a simple monoalphabetic substitution cipher. However, this may not have been an issue in the cipher’s time.
Ciphertext is written out in groups of fixed length, the traditional group size being 5 letters, and punctuation is excluded. This is to make it harder to guess things based on word boundaries.
Examples
- Encoding
test
givesgvhg
- Decoding
gvhg
givestest
- Decoding
gsvjf rxpyi ldmul cqfnk hlevi gsvoz abwlt
givesthequickbrownfoxjumpsoverthelazydog
Test suite
use atbash_cipher as cipher;
Solution
Version 1
const PLAIN: &str = "abcdefghijklmnopqrstuvwxyz";
const CIPHER: &str = "zyxwvutsrqponmlkjihgfedcba";
/// "Encipher" with the Atbash cipher.
/// "Decipher" with the Atbash cipher.
L11
uses .rev()
feature to pin-point the index of CIPHER
text. This is a little bit tricky.
In this version, I didn’t find any solutions to wrap pushing string into a .map()
, which makes it look like C/C++-style and ugly since we need a loop to append space
for me. This can be changed!
Version 2
const PLAIN: &str = "abcdefghijklmnopqrstuvwxyz";
const CIPHER: &str = "zyxwvutsrqponmlkjihgfedcba";
/// "Encipher" with the Atbash cipher.
/// "Decipher" with the Atbash cipher.
In version 2, I’ve made changes about wrapping encoded string. It happens at L19~24
.
Explanation:
L21
takes value for chunking vector as several slices(vector)L22
maps the chunked vector intoString
L23
collects all data generated in step 2 together and outputs asVec<String>
L24
use.join()
method for joining item inVec<String>
with space and outputs the final result.
So, basically, you don’t need a loop any more to add space after 5 characters. Removing loops in Rust is a kind of beautiful way of processing data.
Version 3
const PLAIN: &str = "abcdefghijklmnopqrstuvwxyz";
const CIPHER: &str = "zyxwvutsrqponmlkjihgfedcba";
/// "Encipher" with the Atbash cipher.
/// "Decipher" with the Atbash cipher.
In version 3, in order to challenge memory usage, I’ve mainly removed one layer of .collect()
since .collect()
will consume heap memory usage which affects performance. So, I avoid using .chunk()
and allocating memory on the heap again in .map()
and replace it with the iterator
. Why I choose iterator
is that their lazily-evaluated nature, don’t allocate on the heap until explicitly told so (with .collect()
for instance).
Also, .enumerate()
yields pairs so we could use .flat_map()
to remove the layer of iterator and collect them as String
only once.
Saving memory allocation on the heap. Goal Achieved.
Version 4
const PLAIN: &str = "abcdefghijklmnopqrstuvwxyz";
const CIPHER: &str = "zyxwvutsrqponmlkjihgfedcba";
/// return Iterator type: https://gist.github.com/conundrumer/4e57c14705055bb2deac1b9fde84f83b
+ 'a
/// "Encipher" with the Atbash cipher.
/// "Decipher" with the Atbash cipher.
In version 4, atbash()
no longer returns Vec<char>
, which avoids the memory allocation on the heap again. But returning Iterator
is a little bit tricky. You can find more examples in here.
Another change is that I move backward of .to_lowercase()
and use .flat_map()
instead since filtering all upprcase outside the function atbash()
is not graceful and does not conform to the engineering.
So, basically in this version, .collect()
operation is only used once. Memory allocation on the heap only happens once as well.