- SQL
- Structured Query Language. The standard language databases speak. A .sql file is just text containing commands like CREATE TABLE and INSERT INTO.
- SQL dump
- A text file exported from a database that recreates it exactly — the tables (CREATE TABLE) and every row of data (INSERT INTO ...).
- CSV
- Comma-Separated Values. A plain-text spreadsheet where each line is a row and columns are separated by commas.
- Row
- One record in a table — for example, one patient.
- Column
- One field on every row — for example, first_name or email.
- Primary key (PK)
- The column whose value uniquely identifies each row (usually 'id'). DeIdentify uses PK values to keep foreign-key links consistent after renaming.
- Foreign key (FK)
- A column that points to a row in another table. Example: encounters.patient_id points to patients.id. DeIdentify keeps these links intact.
- PII
- Personally Identifiable Information — any data that can identify a real person: names, emails, phone numbers, addresses, SSNs, etc.
- PHI
- Protected Health Information — PII combined with anything about a person's health, care, or payment for care. Governed by HIPAA in the US.
- De-identification
- Replacing PII/PHI with realistic-looking but fake values so the data is safe to share for testing, analytics, or research.
- Pseudonymization
- A form of de-identification where each real value is mapped to a stable fake one (Alice → Marta every time). Reversible only if the mapping is kept.
- Anonymization
- Stronger than pseudonymization — no mapping is kept and re-identification is meant to be infeasible.
- HIPAA Safe Harbor
- One of two HIPAA methods for de-identifying PHI. Removes 18 specific identifier categories (names, dates, geographic detail, IDs, etc.). The built-in preset applies all 18.
- Expert Determination
- The other HIPAA method — a statistician certifies re-identification risk is very small. DeIdentify supports this workflow with per-entity date shifts and generalization.
- Salt
- A short secret string mixed into hashes and fake generators so the same input maps to the same output within one run. Changing the salt gives you a totally new, still-consistent mapping.
- Hash
- A one-way function that turns a value into a fixed-length code. Deterministic (same input → same output) but not reversible without brute force.
- Preflight
- The pre-check DeIdentify runs before rewriting. Flags missing primary keys, INSERTs without column lists, and statement types the tool can't rewrite.
- Preset
- A saved bundle of column strategies (e.g. 'HIPAA Safe Harbor'). One click and every recognized column gets a sensible default.
- Strategy
- What to do with a column's values: Keep, Fake, Hash, Shift, Generalize, or Redact.
- Date shift
- Add or subtract a random-but-consistent number of days from every date. Preserves intervals (admission → discharge stays 4 days) while hiding the real calendar.
- Per-entity shift
- The same shift is applied to every date belonging to one patient. Different patients get different shifts. Recommended for HIPAA Expert Determination.
- Per-table shift
- The same shift is applied to every date in one table. Simpler; leaks less about individuals when a table has many people.
- Generalize
- Replace an exact value with a broader bucket. Example: '78 years old' → '75-84', '2024-03-18' → '2024', '10024' → '100XX'.
- Redact / Null
- Drop the value entirely — set it to NULL or an empty string. Use when the field is not needed downstream.
- Referential integrity
- The guarantee that foreign keys still line up after rewriting. When patients.id becomes '42→7891', encounters.patient_id also becomes '42→7891'.