Print my string, Elixir!

If you have ever tried to do anything with strings in Elixir, you probably experienced being presented a binary where you wanted to see a string:

1
2
iex(1)> my_string
<<99, 111, 109, 109, 105, 116, 32, 97, 97, 97, ...>>

This actually happened to me today as I was trying to read logs from a git repository:

1
2
iex(1)> {my_string, 0} = System.cmd("git", ["log"])
{<<99, 111, 109, 109, 105, 116, 32, 97, 97, 97, ...>>, 0}

Valid?

That’s weird, right? This is the text that would printed in my terminal if I run git log. How could it not be a valid string?

Indeed, it was a valid string:

1
2
iex(2)> String.valid?(my_string)
true

Ok, if the string is valid, but it is being printed as a binary… that must mean…

Printable?

:bulb: It is not printable! :bulb:

1
2
iex(3)> String.printable?(my_string)
false

But what kind of unprintable characters could there be in my git logs?

1
2
3
4
iex(4)> my_string \
...(4)> |> String.codepoints \
...(4)> |> Enum.filter(fn(p) -> !String.printable?(p) end)
[<<194, 130>>]

The character here is a ‘BREAK PERMITTED HERE’ (U+0082).

Solution

For my use case, it was absolutely fine to remove that unprintable character and move on:

1
2
3
4
5
iex(5)> my_string \
...(5)> |> String.codepoints \
...(5)> |> Enum.filter(fn(p) -> String.printable?(p) end) \
...(5)> |> Enum.join("")
"commit aaaccbfbb6b398972dae2ad0..." # a very very long string here

Cause

But I was still curious. I thought that maybe this is the character that causes my terminal to truncate output and wait for a q press to show more, but that seemed unlikely. I checked if I would experience the same thing in a different repository. I did not. I was certain then that the issue is inside one of the commit messages. I wanted to find that commit message.

First I found the index of the unprintable character:

1
2
3
4
iex(6)> i = my_string \
...(6)> |> String.codepoints \
...(6)> |> Enum.find_index(fn(p) -> !String.printable?(p) end)
83215

Then I printed a few characters around it to read what that snippet of logs is about:

1
2
3
4
5
6
iex(7)> my_string \
...(7)> |> String.codepoints \
...(7)> |> Enum.slice(i - 1, 3) \
...(7)> |> Enum.filter(fn(p) -> String.printable?(p) end) \
...(7)> |> Enum.join("")
"â¬"

By reading a longer snippet (that I cannot share with you), I figured out that this is indeed a part of a commit message and I was able to find the hash of that commit. The whole message made sense and only had a few gibberish characters at the end. I asked the author of that message about it. He said he meant to write @. At least I know that it was not git’s fault :smile:.