Open Tool Generates Privacy-Safe Passenger Data for Aviation-Security Research

Every time you board a plane you leave a digital breadcrumb trail: who you are, where you sit, how you paid, who you travel with, where you stayed last night. This record – called a Passenger Name Record (PNR) – is gold dust for security analysts who hunt suspicious travel patterns. Yet real PNRs are locked behind strict privacy laws and commercial secrecy, starving researchers of the data they need to build smarter, fairer screening tools.

A team from the University of Sheffield has now released an open-source pipeline that fabricates millions of realistic, privacy-free PNRs. The synthetic dataset mimics the statistical quirks of genuine traveller behaviour – down to the hour-of-day you are likely to buy a ticket – while keeping every single “passenger” fictitious. The work, presented at the ISCRAM 2025 conference, offers aviation-intelligence projects a safe sandbox in which to test machine-learning models, risk-scoring engines and what-if epidemic simulations without touching sensitive records.


Why not just anonymise the real thing?

Traditional anonymisation (remove names, hash passport numbers) still leaks identity through combinations—your birthday, route and credit-card vendor can re-identify you in milliseconds. Once data are open, re-identification is a one-way street. Regulators therefore keep PNRs under heavy lock-and-key, slowing innovation. Synthetic data sidestep the problem by growing entirely new people, flights and social networks that never existed.


Five-step flight factory

The Sheffield framework, dubbed SynPNR, builds a fake world in layers:

  1. Population engine
    Country-level census tables (UN & World Bank) are expanded into 4.7 million synthetic residents across 36 nations. Households get believable age gaps, marriage patterns and even transliterated names that fit local alphabets.
  2. Social glue
    Algorithms weave family, friend and co-worker ties inside cities, then add weaker long-distance links. Shared nationality boosts link probability by 50 %—a nod to sociological studies on homophily.
  3. Agent minds
    Every adult becomes an autonomous software agent with a calendar, bank card and frequent-flyer status. Agents decide how often they can travel (propensity), pick a purpose (business, holiday, family), choose companions from their network and select destinations weighted by real 2019 flight capacities.
  4. Seat allocator
    Valid airline routes (minimum 210 operating days/year) are loaded from OpenSky crowd-sourced data. The engine books parties into flights, respects seat availability, adds baggage drawn from purpose-specific distributions and timestamps the purchase.
  5. XML printer
    Output is a stack of industry-standard PNR files that plug straight into existing analyst tools.

Does it look authentic?

The group benchmarked their 4250-flight French–Greek subset against 2019 Eurocontrol and national statistics:

MetricReal worldSynthetic
Average load factor57 %56.9 %
Hourly passenger flow shapeKLD = 0.015 (near-perfect match)
Solo travellers70 %83 % *
Business-trip length4 days3.7 days

* Over-generation of solo trips is already flagged for tuning in the next release.

Age–sex pyramids for France and Greece diverge by <0.05 Kullback-Leibler—a difference smaller than typical sampling error in official micro-censuses.


What can you do with it?

  • Train risk-scoring AIs without GDPR headaches.
  • Stress-test border-screening rules under synthetic epidemics or mass-events.
  • Model group-based threats (e.g., small cells booking separate flights).
  • Share full datasets with external collaborators—no data-use agreements needed.

Limits & next hops

  • Group travel is still too rare; social-network density will be dialed up.
  • Seasonal travel dips in late-summer are weaker than reality.
  • Expert usability review with border police is under way.
  • Code already on GitHub; larger 20-million-passenger run scheduled for late-2025.

Take-away

SynPNR reproduces the statistical soul of air-travel data while surgically removing privacy marrow. For researchers boxed out of real PNR vaults, fake flights may finally unlock real insights.


Try it yourself

Code & sample data: github.com/fafadlian/Synthetic-PNR-Generation
Citation: Fadlian et al., “Generating Realistic Passenger Name Records with Privacy Compliance for Security Analysis”, ISCRAM 2025.

Comments

Leave a Reply

XHTML: You can use these tags:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> 

This project has received funding from the European Union's Horizon Europe research and innovation programme under grant agreement No 101074048