Unescape a String that contains standard Java escape sequences

I came across a problem that I needed to parse a String (coming from a database, file or web service) that contains standard Java escape sequences and had to convert the escape sequences to the proper characters.

Java escape sequences can be placed in a string literal in three ways:

  1. Standard escapes with \b \f \n \r \t \” \’ : These represent the standard control characters BS, FF, NL, CR, TAB and the double and single quote.
  2. Octal escapes with \0 to \377 : These represent a single character (0-255 decimal, 0x00-0xff in hexadecimal) in octal notation.
  3. Hexadecimal Unicode character with \uXXXX : A hexadecimal representation of a Unicode character.
  4. The \ character itself with \\ : If you want to type the \ character itself.

Though this an integral part of the Java compiler when compiling code, there is no function in the standard Java runtime, that will convert such a String notation into an unescaped target String.

Apache Commons has a StringUtils class, that can do this, but this requires a lot of overhead for just a small task. You need a rather large JAR to bind and have a license attached. Depending on the version, the Apache Commons implementation also lacks some functionality and has several implementation flaws.

Here’s a quick solution in one method, also available as a Gist on GitHub:

    /**
     * Unescapes a string that contains standard Java escape sequences.
     * <ul>
     * <li><strong>\b \f \n \r \t \" \'</strong> :
     * BS, FF, NL, CR, TAB, double and single quote.</li>
     * <li><strong>\X \XX \XXX</strong> : Octal character
     * specification (0 - 377, 0x00 - 0xFF).</li>
     * <li><strong>\uXXXX</strong> : Hexadecimal based Unicode character.</li>
     * </ul>
     * 
     * @param st
     *            A string optionally containing standard java escape sequences.
     * @return The translated string.
     */
    public String unescapeJavaString(String st) {

        StringBuilder sb = new StringBuilder(st.length());

        for (int i = 0; i < st.length(); i++) {
            char ch = st.charAt(i);
            if (ch == '\\') {
                char nextChar = (i == st.length() - 1) ? '\\' : st
                        .charAt(i + 1);
                // Octal escape?
                if (nextChar >= '0' && nextChar <= '7') {
                    String code = "" + nextChar;
                    i++;
                    if ((i < st.length() - 1) && st.charAt(i + 1) >= '0'
                            && st.charAt(i + 1) <= '7') {
                        code += st.charAt(i + 1);
                        i++;
                        if ((i < st.length() - 1) && st.charAt(i + 1) >= '0'
                                && st.charAt(i + 1) <= '7') {
                            code += st.charAt(i + 1);
                            i++;
                        }
                    }
                    sb.append((char) Integer.parseInt(code, 8));
                    continue;
                }
                switch (nextChar) {
                case '\\':
                    ch = '\\';
                    break;
                case 'b':
                    ch = '\b';
                    break;
                case 'f':
                    ch = '\f';
                    break;
                case 'n':
                    ch = '\n';
                    break;
                case 'r':
                    ch = '\r';
                    break;
                case 't':
                    ch = '\t';
                    break;
                case '\"':
                    ch = '\"';
                    break;
                case '\'':
                    ch = '\'';
                    break;
                // Hex Unicode: u????
                case 'u':
                    if (i >= st.length() - 5) {
                        ch = 'u';
                        break;
                    }
                    int code = Integer.parseInt(
                            "" + st.charAt(i + 2) + st.charAt(i + 3)
                                    + st.charAt(i + 4) + st.charAt(i + 5), 16);
                    sb.append(Character.toChars(code));
                    i += 5;
                    continue;
                }
                i++;
            }
            sb.append(ch);
        }
        return sb.toString();
    }
Advertisement

One Response to Unescape a String that contains standard Java escape sequences

  1. Lars Jago says:

    Thank you so so much :D I searched and searched for a solution that actually works until I stumbled across this post. Works like a charm! (and it probably kept me from smashing my keyboard at some point :P )

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: