Unescape a String that contains standard Java escape sequences
September 28, 2013 1 Comment
I came across a problem that I needed to parse a String (coming from a database, file or web service) that contains standard Java escape sequences and had to convert the escape sequences to the proper characters.
Java escape sequences can be placed in a string literal in three ways:
- Standard escapes with \b \f \n \r \t \” \’ : These represent the standard control characters BS, FF, NL, CR, TAB and the double and single quote.
- Octal escapes with \0 to \377 : These represent a single character (0-255 decimal, 0x00-0xff in hexadecimal) in octal notation.
- Hexadecimal Unicode character with \uXXXX : A hexadecimal representation of a Unicode character.
- The \ character itself with \\ : If you want to type the \ character itself.
Though this an integral part of the Java compiler when compiling code, there is no function in the standard Java runtime, that will convert such a String notation into an unescaped target String.
Apache Commons has a StringUtils class, that can do this, but this requires a lot of overhead for just a small task. You need a rather large JAR to bind and have a license attached. Depending on the version, the Apache Commons implementation also lacks some functionality and has several implementation flaws.
Here’s a quick solution in one method, also available as a Gist on GitHub:
/** * Unescapes a string that contains standard Java escape sequences. * <ul> * <li><strong>\b \f \n \r \t \" \'</strong> : * BS, FF, NL, CR, TAB, double and single quote.</li> * <li><strong>\X \XX \XXX</strong> : Octal character * specification (0 - 377, 0x00 - 0xFF).</li> * <li><strong>\uXXXX</strong> : Hexadecimal based Unicode character.</li> * </ul> * * @param st * A string optionally containing standard java escape sequences. * @return The translated string. */ public String unescapeJavaString(String st) { StringBuilder sb = new StringBuilder(st.length()); for (int i = 0; i < st.length(); i++) { char ch = st.charAt(i); if (ch == '\\') { char nextChar = (i == st.length() - 1) ? '\\' : st .charAt(i + 1); // Octal escape? if (nextChar >= '0' && nextChar <= '7') { String code = "" + nextChar; i++; if ((i < st.length() - 1) && st.charAt(i + 1) >= '0' && st.charAt(i + 1) <= '7') { code += st.charAt(i + 1); i++; if ((i < st.length() - 1) && st.charAt(i + 1) >= '0' && st.charAt(i + 1) <= '7') { code += st.charAt(i + 1); i++; } } sb.append((char) Integer.parseInt(code, 8)); continue; } switch (nextChar) { case '\\': ch = '\\'; break; case 'b': ch = '\b'; break; case 'f': ch = '\f'; break; case 'n': ch = '\n'; break; case 'r': ch = '\r'; break; case 't': ch = '\t'; break; case '\"': ch = '\"'; break; case '\'': ch = '\''; break; // Hex Unicode: u???? case 'u': if (i >= st.length() - 5) { ch = 'u'; break; } int code = Integer.parseInt( "" + st.charAt(i + 2) + st.charAt(i + 3) + st.charAt(i + 4) + st.charAt(i + 5), 16); sb.append(Character.toChars(code)); i += 5; continue; } i++; } sb.append(ch); } return sb.toString(); }
Thank you so so much :D I searched and searched for a solution that actually works until I stumbled across this post. Works like a charm! (and it probably kept me from smashing my keyboard at some point :P )