When deidentifying personal data, structured data can have real names deleted or replaced with random names without impacting privacy.
For structured data in databases, removing a column with names is a necessary step for deidentification, but replacing names doesn't affect privacy much.
In cases like software testing with patient data, replacing real names with randomly generated names can be useful.
For unstructured data like doctor's notes, replacing names with synthetic names is better for privacy than just removing them.
Replacing names in unstructured data helps maintain privacy as software might not accurately identify all names.
Synthetic names obscure failures of data deidentification software better than 'NAME REDACTED' placeholder.
Using synthetic names in unstructured text deidentification is preferable as software errors are less obvious.
An effective process for synthesizing names should have a low error rate and be unbiased.
Synthesizing names rather than removing them is better if the error rate is sufficiently low and unbiased.
Deidentification software often has error rates, but it doesn't need to be zero to comply with privacy regulations like HIPAA.