Challenge
In this post I'm going to talk about the permutation challenge. What we require to accomplish here is to compare two input strings and check if they have the same length with the same set of characters case sensitively.
Our main constraints are defined as:

Can we assume the string is ASCII?

Yes

Note: Unicode strings could require special handling depending on your language


Is whitespace important?

Yes


Is this case sensitive? 'Nib', 'bin' is not a match?

Yes


Can we use additional data structures?

Yes


Can we assume this fits in memory?

Yes

Python Solution
The unit tests provided for this challenge is as follows:
from nose.tools import assert_equal
class TestPermutation(object):
def test_permutation(self, func):
assert_equal(func(None, 'foo'), False)
assert_equal(func('', 'foo'), False)
assert_equal(func('Nib', 'bin'), False)
assert_equal(func('act', 'cat'), True)
assert_equal(func('a ct', 'ca t'), True)
assert_equal(func('dog', 'doggo'), False)
print('Success: test_permutation')
def main():
test = TestPermutation()
permutations = Permutations()
test.test_permutation(permutations.is_permutation)
try:
permutations_alt = PermutationsAlt()
test.test_permutation(permutations_alt.is_permutation)
except NameError:
# Alternate solutions are only defined
# in the solutions file
pass
if __name__ == '__main__':
main()
First I'm going to make sure the input arguments, have the valid type and the same length (passing the first two cases):
class Permutations(object):
def is_permutation(self, str1, str2):
if not (isinstance(str1, str) and isinstance(str2, str)):
return False
if not len(str1) == len(str2):
return False
Now that we know that our inputs are valid and have the same length, it's time
to check the permutation. For that we should make sure that they have the same
set characters (ignoring the order). I know that the set
data structure has
some cool features for comparing two sets by each other, and reading its
document I found symmetric_difference
function which sounds like a good match:
Return a new set with elements in either the set or other but not both.
So if the resulting value of this function applied on our input strings is empty, we can be sure they have the same set of characters:
str1 = set(str1)
str2 = set(str2)
return not str1.symmetric_difference(str2)
And it passes the test suite as you can confirm on replit:
$ open repl.it/@shahinism/PermutationPython █
The time complexity of symmetric_difference
based on Python's time complexity
is $O(len(s))$ where s is the left most set in average case. And in the worst
case it's $O(s * t)$ where t is the right most set. So it should be tolerable
for our use case (considering the normal length of words).
So far so good. Now let's look at the two interesting answers provided by the challenge repository. The first one is what I like the most:
class Permutations(object):
def is_permutation(self, str1, str2):
if str1 is None or str2 is None:
return False
return sorted(str1) == sorted(str2)
The value type validation is not what I like to use. Even in my own
implementation I would like to raise a TypeError
instead of returning False
,
however, I returned False
because it was expected in unit tests.
Let's quit ranting and look at the brilliant part of the solution sorted(str1)
== sorted(str2)
. The return value of sorted
function is a sorted list of
individual characters available in the string. The comparison between the lists
returns True
only if they have the same length and each element should
have the same index in both of the lists.
However the cleverness of this solution can become costly when the size of our
strings is bigger (probably won't happen in this usecase). The reason for that
is the builtin sorted
function in Python has the complexity of $O(n\ log\ n)$
for average case.
Now it's time for the second solution they have provided:
from collections import defaultdict
class PermutationsAlt(object):
def is_permutation(self, str1, str2):
if str1 is None or str2 is None:
return False
if len(str1) != len(str2):
return False
unique_counts1 = defaultdict(int)
unique_counts2 = defaultdict(int)
for char in str1:
unique_counts1[char] += 1
for char in str2:
unique_counts2[char] += 1
return unique_counts1 == unique_counts2
This is probably the most solid solution time wise (having time complexity of
$O(n)$. How this solution works is based on the comparison of two dict
where
the keys are unique characters and values are the number of repetitions. As a
matter of fact this solution (comparing dict of repetition counter) is enough by
itself and makes the length comparison a redundant part of the source code. The
following version would work the same:
from collections import defaultdict
class PermutationsAlt(object):
def is_permutation(self, str1, str2):
if str1 is None or str2 is None:
return False
unique_counts1 = defaultdict(int)
unique_counts2 = defaultdict(int)
for char in str1:
unique_counts1[char] += 1
for char in str2:
unique_counts2[char] += 1
return unique_counts1 == unique_counts2
You can confirm it in my replit entry as well: