Open Tool Generates Privacy-Safe Passenger Data for Aviation-Security Research
Every time you board a plane you leave a digital breadcrumb trail: who you are, where you sit, how you paid, who you travel with, where you stayed last night. This record – called a Passenger Name Record (PNR) – is gold dust for security analysts who hunt suspicious travel patterns. Yet real PNRs are locked behind strict privacy laws and commercial secrecy, starving researchers of the data they need to build smarter, fairer screening tools.
A team from the University of Sheffield has now released an open-source pipeline that fabricates millions of realistic, privacy-free PNRs. The synthetic dataset mimics the statistical quirks of genuine traveller behaviour – down to the hour-of-day you are likely to buy a ticket – while keeping every single “passenger” fictitious. The work, presented at the ISCRAM 2025 conference, offers aviation-intelligence projects a safe sandbox in which to test machine-learning models, risk-scoring engines and what-if epidemic simulations without touching sensitive records.
Why not just anonymise the real thing?
Traditional anonymisation (remove names, hash passport numbers) still leaks identity through combinations—your birthday, route and credit-card vendor can re-identify you in milliseconds. Once data are open, re-identification is a one-way street. Regulators therefore keep PNRs under heavy lock-and-key, slowing innovation. Synthetic data sidestep the problem by growing entirely new people, flights and social networks that never existed.
Five-step flight factory
The Sheffield framework, dubbed SynPNR, builds a fake world in layers:
- Population engine
Country-level census tables (UN & World Bank) are expanded into 4.7 million synthetic residents across 36 nations. Households get believable age gaps, marriage patterns and even transliterated names that fit local alphabets. - Social glue
Algorithms weave family, friend and co-worker ties inside cities, then add weaker long-distance links. Shared nationality boosts link probability by 50 %—a nod to sociological studies on homophily. - Agent minds
Every adult becomes an autonomous software agent with a calendar, bank card and frequent-flyer status. Agents decide how often they can travel (propensity), pick a purpose (business, holiday, family), choose companions from their network and select destinations weighted by real 2019 flight capacities. - Seat allocator
Valid airline routes (minimum 210 operating days/year) are loaded from OpenSky crowd-sourced data. The engine books parties into flights, respects seat availability, adds baggage drawn from purpose-specific distributions and timestamps the purchase. - XML printer
Output is a stack of industry-standard PNR files that plug straight into existing analyst tools.
Does it look authentic?
The group benchmarked their 4250-flight French–Greek subset against 2019 Eurocontrol and national statistics:
| Metric | Real world | Synthetic |
|---|---|---|
| Average load factor | 57 % | 56.9 % |
| Hourly passenger flow shape | — | KLD = 0.015 (near-perfect match) |
| Solo travellers | 70 % | 83 % * |
| Business-trip length | 4 days | 3.7 days |
* Over-generation of solo trips is already flagged for tuning in the next release.
Age–sex pyramids for France and Greece diverge by <0.05 Kullback-Leibler—a difference smaller than typical sampling error in official micro-censuses.
What can you do with it?
- Train risk-scoring AIs without GDPR headaches.
- Stress-test border-screening rules under synthetic epidemics or mass-events.
- Model group-based threats (e.g., small cells booking separate flights).
- Share full datasets with external collaborators—no data-use agreements needed.
Limits & next hops
- Group travel is still too rare; social-network density will be dialed up.
- Seasonal travel dips in late-summer are weaker than reality.
- Expert usability review with border police is under way.
- Code already on GitHub; larger 20-million-passenger run scheduled for late-2025.
Take-away
SynPNR reproduces the statistical soul of air-travel data while surgically removing privacy marrow. For researchers boxed out of real PNR vaults, fake flights may finally unlock real insights.
Try it yourself
Code & sample data: github.com/fafadlian/Synthetic-PNR-Generation
Citation: Fadlian et al., “Generating Realistic Passenger Name Records with Privacy Compliance for Security Analysis”, ISCRAM 2025.
Comments