Invisible Character Remover

Detect and remove zero-width, soft hyphen and other invisible Unicode characters

What is it and how does it work?

Invisible characters are Unicode code points that take up no visible space but can cause significant issues in text processing, databases, APIs, and user interfaces. The most common culprits: Zero Width Space (U+200B), Zero Width Non-Breaking Space (U+FEFF, the BOM), Zero Width Joiner (U+200D) and Non-Joiner (U+200C), Left-to-Right Mark (U+200E) and Right-to-Left Mark (U+200F), Soft Hyphen (U+00AD), and various control characters (U+0000–U+001F). These characters are invisible in most text editors and browsers, making them extremely difficult to detect by eye.

This tool detects and removes invisible characters from text, highlighting exactly where they occur and which Unicode code point each one is. Common sources of invisible characters: copying text from PDFs (which often inject soft hyphens), pasting from Word documents (which add BOM markers), copying from websites that use zero-width spaces for font kerning, or receiving text via APIs that include RTL/LTR control markers. Invisible characters can break regex patterns, cause string equality checks to fail, and corrupt database entries.

Common use cases

Frequently asked questions

What is the Unicode BOM (Byte Order Mark) and is it always safe to remove?

The BOM (U+FEFF) at the start of a UTF-8 file is technically an invisible character. In UTF-8, the BOM is unnecessary (UTF-8 has no byte order ambiguity) and causes issues: web servers including it in HTML break character encoding detection; JavaScript files with BOM break older parsers. You can safely remove BOM from UTF-8 text and UTF-8 HTML files. For UTF-16 and UTF-32 files, the BOM is meaningful for byte order detection — don't remove it there.

What are Zero Width Joiners used for legitimately?

Zero Width Joiner (U+200D) is legitimately used in emoji sequences: 👨‍👩‍👧‍👦 (family emoji) is actually four separate emoji joined by U+200D. Removing ZWJ from emoji sequences breaks them into their component emoji. ZWJ is also used in some South Asian scripts (Devanagari, etc.) to control glyph rendering. If your text contains emoji or South Asian script, review ZWJ removals carefully rather than blindly stripping all invisible characters.

Why do zero-width spaces appear in text from websites?

Zero Width Space (U+200B) is used in web typography as a "soft wrap opportunity" — a point where the browser can break a long word across lines. Some websites, especially those displaying URLs or code inline, insert ZWSP to enable wrapping without a visible hyphen. Copy-pasting from such pages includes these characters. Some content management systems also insert ZWSP inadvertently. They're visually invisible but break string matching.

How can I detect invisible characters programmatically?

Regex: `/[\u200B-\u200D\uFEFF\u00AD\u200E\u200F]/g` covers the most common ones. More comprehensive: `/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F\u200B-\u200F\u2028\u2029\u202A-\u202E\uFEFF]/g`. In Python: `import unicodedata; unicodedata.category(c)` — categories Cf (format), Cc (control), and Zs (space separators) cover most invisible characters. For zero-width characters specifically, their `unicodedata.east_asian_width()` returns "N" with zero visual width.

Text

Uppercase / Lowercase · Word Counter · Character Counter · Lorem Ipsum Generator · Remove Extra Spaces · Sort Text Lines