Japanese CSV Name List Cleaner

Drop a CSV with Japanese names and data. The tool NFKC-normalizes zenkaku/hankaku characters, trims whitespace, normalizes phone numbers, lowercases emails, and removes duplicate rows — all in your browser, nothing is uploaded.

📄
Drop a CSV file here or click to browse

NFKC normalization and deduplication apply to all columns regardless of this selection. Name/phone/email selections enable the extra cleaning steps for those columns.

Results

0
Total rows
0
Rows changed
0
Duplicates removed
0
Output rows

Preview (first 50 rows · green = changed, red badge = duplicate removed):

How it works

  1. Parse CSV — PapaParse reads the file in your browser (UTF-8 or Shift-JIS auto-detected). Header row is preserved as-is.
  2. NFKC normalize all columns — Unicode NFKC compatibility decomposition converts full-width (zenkaku) ASCII letters, digits, and symbols to half-width (hankaku), and collapses full-width spaces to ordinary spaces. Example: Abc123Abc123.
  3. Name column cleanup — Leading/trailing whitespace trimmed; runs of multiple spaces collapsed to a single space. Example:  田中  太郎 田中 太郎.
  4. Phone column normalization — Hyphens, parentheses, spaces, and dots removed; only digits (and a leading +) kept. Example: 03-1234-56780312345678.
  5. Email column lowercase — ASCII email addresses forced to lowercase. Example: User@Example.COMuser@example.com.
  6. Deduplicate — Rows whose every field is identical (after cleaning) are removed; only the first occurrence is kept. The count of removed rows is shown in the summary.

Frequently asked questions

What is NFKC normalization and why does it matter for Japanese CSVs?
NFKC (Unicode Normalization Form KC) converts "compatibility equivalent" characters to their canonical forms. In practice, this means full-width (zenkaku) Latin letters and numbers typed on Japanese keyboards — like (U+FF21) or (U+FF11) — are converted to their ordinary half-width equivalents (A, 1). This is critical for Japanese name lists because the same person's name might be stored with zenkaku digits in one record and hankaku in another, making them look different even though they are the same. NFKC normalization is the standard pre-processing step before deduplication and database import.
Is my CSV data sent to any server?
No. The entire process runs inside your browser using JavaScript. Your file is read locally via the File API and never leaves your device. This is especially important for name lists that contain personal information (個人情報) — there is no upload, no logging, and no third-party data transfer.
What encoding does the tool support? What about Shift-JIS files exported from Excel?
The tool uses PapaParse, which auto-detects encoding and handles UTF-8 with BOM (common in Excel CSV exports). If your file is Shift-JIS, re-save it from Excel as "CSV UTF-8 (comma delimited)" before dropping it here, or open the file in a text editor and save with UTF-8 encoding. Most modern CRM and database exports already use UTF-8.
Does deduplication look at all columns or just the name column?
Deduplication checks every column in each row. Two rows are considered duplicates only when all their fields are identical after cleaning. This avoids accidentally removing two people who share the same name but have different phone numbers or email addresses.
What happens to phone numbers that already use hyphens or brackets?
The phone cleaning step strips hyphens (-, ), parentheses (()()), spaces, and dots, leaving only digits. A leading + for international dialling codes is preserved. So (03)1234-5678, 03-1234-5678, and 0312345678 all become 0312345678, making deduplication far more effective.