# Polyglot Dojo #1: Compress String

## Challenge

The first challenge we pick to solve is Compress Challange. The description of the challenge says:

Compress a string such that 'AAABCCDDDD' becomes 'A3BC2D4'. Only compress the string if it saves space

To solve this challenge, we must satisfy the following constraints:

• We can assume the string is ASCII

• The compression should be case sensitive

• We can use additional data structures

• We can assume the string would fit into the memory

They also provide the following test suite to check our solution:

from nose.tools import assert_equal

class TestCompress(object):

def test_compress(self, func):
assert_equal(func(None), None)
assert_equal(func(''), '')
assert_equal(func('AABBCC'), 'AABBCC')
assert_equal(func('AAABCCDDDDE'), 'A3BC2D4E')
assert_equal(func('BAAACCDDDD'), 'BA3C2D4')
assert_equal(func('AAABAACCDDDD'), 'A3BA2C2D4')
print('Success: test_compress')

def main():
test = TestCompress()
compress_string = CompressString()
test.test_compress(compress_string.compress)

if __name__ == '__main__':
main()

## Python Solution

The high-level representation of the idea I can think of here is that we want to group continuous alike characters from a string, and then represent their group length as a number. Kind of like the following picture:

The first approach I tried here was using collection.Counter. Here is an example of how we can use it:

from collections import Counter

count = Counter("AABBCC")

print(count)
Counter({'A': 2, 'B': 2, 'C': 2})


Well, at first sight, it looks like a correct solution, and of course, it can satisfy all of the assertions from the suggested test suite except the last one. The problem with the last test is that the Counter doesn't look for continuity of the characters, so when we pass "AABBAA" to it, look what happens:

from collections import Counter

print(Counter("AABBAA"))
Counter({'A': 4, 'B': 2})


The Counter counts a character all over the string, which is not what we want. Finding out that is not enough; I knew I need to try another approach. To extract these groups, we can iterate through the characters of the string, and check their identity with the previous one. However, it's a bit mechanical for my taste, so I preferred to search for groupby keyword in Python's standard library. And fortunately, I got lucky and found a bit bizarre groupby function.

However, before continuing that, I should add, one exciting thing I learned while working on these challenges is, usefullness of having a local copy of language's documentations. In Linux, I'm using Zeal. For Mac users, I know there is Dash which Zeal uses its document format. Also, for hardcore Emacs users, there is a dash.el which is quite impressive, but I haven't used to it yet.

Anyway, let's get back to our business. I said, groupby was a little bit bizarre to me. Let's look at how it works:

from itertools import groupby

for key, group in groupby("AABBCC"):
print(f"{key}: {list(group)}")
A: ['A', 'A']
B: ['B', 'B']
C: ['C', 'C']


So it gives us what we want, what is bizarre about it? The function name says it is going to group by something. We haven't passed anything as the by part; I mean something like GROUP BY phrase in SQL. So, I took a second look at the documentation where it says:

The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged

The identity function. So it uses some function to compare each element's identity, which is what we want. What is the nature of that function? We will discover that in another blog post.

So after finding a single function, that provides us with all we need to satisfy the basic logic of the problem, we can form it as a solution. Here is mine:

## Rust Solution

Well, the most challenging part for me in this journey was solving it using Rust, and yet I'm not happy with the results. It's probably because it's the language I know far less about comparing to others. However, let's start to learn :blush:. Again, let's port the tests:

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_compress() {
assert_eq!(compress(""), "");
assert_eq!(compress("ABC"), "ABC");
assert_eq!(compress("AABBCC"), "AABBCC");
assert_eq!(compress("AAABCCDDDDE"), "A3BC2D4E");
assert_eq!(compress("BAAACCDDDD"), "BA3C2D4");
assert_eq!(compress("AAABAACCDDDD"), "A3BA2C2D4");
assert_eq!(compress(String::from("AAABAACCDDDD")), "A3BA2C2D4");
}
}


This is quite usual unit testing syntax, not much interesting. However, as you see, I've added an extra test, to make sure it can compress instances of String object just like string literals.

And here is a working solution to the problem:

\$ open repl.it/@shahinism/Compress-String-Rust

I'm going to describe this solution with a high-level perspective instead of focusing on the details. There are two reasons for this:

• As I said, I'm not happy with my solution so far. My experience is quite small on Rust, and I believe all those .to_string() and .as_str() should be eliminated somehow. So I plan to refactor this solution gradually, as I learn more about Rust's type system.

• Don't want my ego or perfectionism, get into the way of my blogging :sweat_smile:. As I said in the introduction of this series, I started it to help me learn more.

With that out of the way, let's see, what is my Rusty solution to this problem. It's quite close to the original Python answer provided in interactive challenges.

• I have a function create_part, which handles the encoding of a character, based on the number of continuous repetitions.

• And the primary compress function which loops through all the characters of the string counting the number of their continuity and appends each part to a result string.

• Finally, I check for the length of the compressed string, and only provide the compressed version as a result, if it's reduced the size of the original string.

## Conclusion

The idea of this series is so exciting to me. Well, I always wanted to improve my overall problem-solving skills, and so far, looks like this series is helping me in that regard. As you see, different paradigms of these languages, are forcing me to solve the problems with different approaches (top-down in Rust and bottom-up in Python and Clojure).

Also, the different level of abstractions provided by each language's standard library is helping me dive into different levels of problem's basics gradually.

I don't want to lie about it, I some times get mad when something doesn't make sense, yet going back and re-reading or repeating the process, is improving my overall confidence in coding. No need to say that it's also helping me to learn new abilities of a language I thought I already knew (Python) :wink:.