Unescape a String that contains standard Java escape sequences
September 28, 2013 1 Comment
I came across a problem that I needed to parse a String (coming from a database, file or web service) that contains standard Java escape sequences and had to convert the escape sequences to the proper characters.
Java escape sequences can be placed in a string literal in three ways:
- Standard escapes with \b \f \n \r \t \” \’ : These represent the standard control characters BS, FF, NL, CR, TAB and the double and single quote.
- Octal escapes with \0 to \377 : These represent a single character (0-255 decimal, 0x00-0xff in hexadecimal) in octal notation.
- Hexadecimal Unicode character with \uXXXX : A hexadecimal representation of a Unicode character.
- The \ character itself with \\ : If you want to type the \ character itself.
Though this an integral part of the Java compiler when compiling code, there is no function in the standard Java runtime, that will convert such a String notation into an unescaped target String.
Apache Commons has a StringUtils class, that can do this, but this requires a lot of overhead for just a small task. You need a rather large JAR to bind and have a license attached. Depending on the version, the Apache Commons implementation also lacks some functionality and has several implementation flaws.
Here’s a quick solution in one method, also available as a Gist on GitHub:
/**
* Unescapes a string that contains standard Java escape sequences.
* <ul>
* <li><strong>\b \f \n \r \t \" \'</strong> :
* BS, FF, NL, CR, TAB, double and single quote.</li>
* <li><strong>\X \XX \XXX</strong> : Octal character
* specification (0 - 377, 0x00 - 0xFF).</li>
* <li><strong>\uXXXX</strong> : Hexadecimal based Unicode character.</li>
* </ul>
*
* @param st
* A string optionally containing standard java escape sequences.
* @return The translated string.
*/
public String unescapeJavaString(String st) {
StringBuilder sb = new StringBuilder(st.length());
for (int i = 0; i < st.length(); i++) {
char ch = st.charAt(i);
if (ch == '\\') {
char nextChar = (i == st.length() - 1) ? '\\' : st
.charAt(i + 1);
// Octal escape?
if (nextChar >= '0' && nextChar <= '7') {
String code = "" + nextChar;
i++;
if ((i < st.length() - 1) && st.charAt(i + 1) >= '0'
&& st.charAt(i + 1) <= '7') {
code += st.charAt(i + 1);
i++;
if ((i < st.length() - 1) && st.charAt(i + 1) >= '0'
&& st.charAt(i + 1) <= '7') {
code += st.charAt(i + 1);
i++;
}
}
sb.append((char) Integer.parseInt(code, 8));
continue;
}
switch (nextChar) {
case '\\':
ch = '\\';
break;
case 'b':
ch = '\b';
break;
case 'f':
ch = '\f';
break;
case 'n':
ch = '\n';
break;
case 'r':
ch = '\r';
break;
case 't':
ch = '\t';
break;
case '\"':
ch = '\"';
break;
case '\'':
ch = '\'';
break;
// Hex Unicode: u????
case 'u':
if (i >= st.length() - 5) {
ch = 'u';
break;
}
int code = Integer.parseInt(
"" + st.charAt(i + 2) + st.charAt(i + 3)
+ st.charAt(i + 4) + st.charAt(i + 5), 16);
sb.append(Character.toChars(code));
i += 5;
continue;
}
i++;
}
sb.append(ch);
}
return sb.toString();
}
