Unescape a String that contains standard Java escape sequences

I came across a problem that I needed to parse a String (coming from a database, file or web service) that contains standard Java escape sequences and had to convert the escape sequences to the proper characters.

Java escape sequences can be placed in a string literal in three ways:

  1. Standard escapes with \b \f \n \r \t \” \’ : These represent the standard control characters BS, FF, NL, CR, TAB and the double and single quote.
  2. Octal escapes with \0 to \377 : These represent a single character (0-255 decimal, 0x00-0xff in hexadecimal) in octal notation.
  3. Hexadecimal Unicode character with \uXXXX : A hexadecimal representation of a Unicode character.
  4. The \ character itself with \\ : If you want to type the \ character itself.

Though this an integral part of the Java compiler when compiling code, there is no function in the standard Java runtime, that will convert such a String notation into an unescaped target String.

Apache Commons has a StringUtils class, that can do this, but this requires a lot of overhead for just a small task. You need a rather large JAR to bind and have a license attached. Depending on the version, the Apache Commons implementation also lacks some functionality and has several implementation flaws.

Here’s a quick solution in one method, also available as a Gist on GitHub:

    /**
     * Unescapes a string that contains standard Java escape sequences.
     * <ul>
     * <li><strong>\b \f \n \r \t \" \'</strong> :
     * BS, FF, NL, CR, TAB, double and single quote.</li>
     * <li><strong>\X \XX \XXX</strong> : Octal character
     * specification (0 - 377, 0x00 - 0xFF).</li>
     * <li><strong>\uXXXX</strong> : Hexadecimal based Unicode character.</li>
     * </ul>
     * 
     * @param st
     *            A string optionally containing standard java escape sequences.
     * @return The translated string.
     */
    public String unescapeJavaString(String st) {

        StringBuilder sb = new StringBuilder(st.length());

        for (int i = 0; i < st.length(); i++) {
            char ch = st.charAt(i);
            if (ch == '\\') {
                char nextChar = (i == st.length() - 1) ? '\\' : st
                        .charAt(i + 1);
                // Octal escape?
                if (nextChar >= '0' && nextChar <= '7') {
                    String code = "" + nextChar;
                    i++;
                    if ((i < st.length() - 1) && st.charAt(i + 1) >= '0'
                            && st.charAt(i + 1) <= '7') {
                        code += st.charAt(i + 1);
                        i++;
                        if ((i < st.length() - 1) && st.charAt(i + 1) >= '0'
                                && st.charAt(i + 1) <= '7') {
                            code += st.charAt(i + 1);
                            i++;
                        }
                    }
                    sb.append((char) Integer.parseInt(code, 8));
                    continue;
                }
                switch (nextChar) {
                case '\\':
                    ch = '\\';
                    break;
                case 'b':
                    ch = '\b';
                    break;
                case 'f':
                    ch = '\f';
                    break;
                case 'n':
                    ch = '\n';
                    break;
                case 'r':
                    ch = '\r';
                    break;
                case 't':
                    ch = '\t';
                    break;
                case '\"':
                    ch = '\"';
                    break;
                case '\'':
                    ch = '\'';
                    break;
                // Hex Unicode: u????
                case 'u':
                    if (i >= st.length() - 5) {
                        ch = 'u';
                        break;
                    }
                    int code = Integer.parseInt(
                            "" + st.charAt(i + 2) + st.charAt(i + 3)
                                    + st.charAt(i + 4) + st.charAt(i + 5), 16);
                    sb.append(Character.toChars(code));
                    i += 5;
                    continue;
                }
                i++;
            }
            sb.append(ch);
        }
        return sb.toString();
    }
Advertisement

Does Java automatically switch daylight saving time?

Short answer: Yes, it does.

Long answer: The time in Java is just a simple long value (milliseconds since 1970) without any information about the time zone. The java.util.Date and java.sql.Date also store the date/time internally as milliseconds since 1970, but with an UTC time zone attached.

The time zone comes into play, when you format a date/time for output or when you parse a date/time from a string. The default time zone can be set through the -Duser.timezone system property during startup, or alternatively by calling the TimeZone.setDefault() method.

You can test it using a small test program. In most European countries, the switch to DST will happen on March 31st 2013 at 2am.
So here is a small program that loops 5 times from March 30th 11pm in one hour steps:

public static void main(String[] args) {
    Calendar c = Calendar.getInstance();
    c.set(2013, 2, 30, 23, 0, 0);
    long start = c.getTimeInMillis();
    long oneHour = 1000 * 60 * 60;
    long t = start;
    for (int i = 0; i < 5; i++) {
        System.out.println(new Date(t));
        t = t + oneHour;
    }
}

Starting it with -Duser.country=DE -Duser.timezone=GMT it will print the following:

Sat Mar 30 23:00:00 GMT 2013
Sun Mar 31 00:00:00 GMT 2013
Sun Mar 31 01:00:00 GMT 2013
Sun Mar 31 02:00:00 GMT 2013
Sun Mar 31 03:00:00 GMT 2013

Note that there is no switch.

Starting it with -Duser.country=DE -Duser.timezone=CET it will print the following:

Sat Mar 30 23:00:00 CET 2013
Sun Mar 31 00:00:00 CET 2013
Sun Mar 31 01:00:00 CET 2013
Sun Mar 31 03:00:00 CEST 2013
Sun Mar 31 04:00:00 CEST 2013

Note the Central European Time switch between 1am and 3am.

Starting it with -Duser.country=DE -Duser.timezone=EET it will print the following:

Sat Mar 30 23:00:00 EET 2013
Sun Mar 31 00:00:00 EET 2013
Sun Mar 31 01:00:00 EET 2013
Sun Mar 31 02:00:00 EET 2013
Sun Mar 31 04:00:00 EEST 2013

Note the Eastern European Time switch between 2am and 4am, because EET is CET plus one hour.

%d bloggers like this: