Open Tool Generates Privacy-Safe Passenger Data for Aviation-Security Research

Every time you board a plane you leave a digital breadcrumb trail: who you are, where you sit, how you paid, who you travel with, where you stayed last night. This record – called a Passenger Name Record (PNR) – is gold dust for security analysts who hunt suspicious travel patterns. Yet real PNRs are locked behind strict privacy laws and commercial secrecy, starving researchers of the data they need to build smarter, fairer screening tools.

A team from the University of Sheffield has now released an open-source pipeline that fabricates millions of realistic, privacy-free PNRs. The synthetic dataset mimics the statistical quirks of genuine traveller behaviour – down to the hour-of-day you are likely to buy a ticket – while keeping every single “passenger” fictitious. The work, presented at the ISCRAM 2025 conference, offers aviation-intelligence projects a safe sandbox in which to test machine-learning models, risk-scoring engines and what-if epidemic simulations without touching sensitive records.


Why not just anonymise the real thing?

Traditional anonymisation (remove names, hash passport numbers) still leaks identity through combinations—your birthday, route and credit-card vendor can re-identify you in milliseconds. Once data are open, re-identification is a one-way street. Regulators therefore keep PNRs under heavy lock-and-key, slowing innovation. Synthetic data sidestep the problem by growing entirely new people, flights and social networks that never existed.


Five-step flight factory

The Sheffield framework, dubbed SynPNR, builds a fake world in layers:

  1. Population engine
    Country-level census tables (UN & World Bank) are expanded into 4.7 million synthetic residents across 36 nations. Households get believable age gaps, marriage patterns and even transliterated names that fit local alphabets.
  2. Social glue
    Algorithms weave family, friend and co-worker ties inside cities, then add weaker long-distance links. Shared nationality boosts link probability by 50 %—a nod to sociological studies on homophily.
  3. Agent minds
    Every adult becomes an autonomous software agent with a calendar, bank card and frequent-flyer status. Agents decide how often they can travel (propensity), pick a purpose (business, holiday, family), choose companions from their network and select destinations weighted by real 2019 flight capacities.
  4. Seat allocator
    Valid airline routes (minimum 210 operating days/year) are loaded from OpenSky crowd-sourced data. The engine books parties into flights, respects seat availability, adds baggage drawn from purpose-specific distributions and timestamps the purchase.
  5. XML printer
    Output is a stack of industry-standard PNR files that plug straight into existing analyst tools.

Does it look authentic?

The group benchmarked their 4250-flight French–Greek subset against 2019 Eurocontrol and national statistics:

MetricReal worldSynthetic
Average load factor57 %56.9 %
Hourly passenger flow shapeKLD = 0.015 (near-perfect match)
Solo travellers70 %83 % *
Business-trip length4 days3.7 days

* Over-generation of solo trips is already flagged for tuning in the next release.

Age–sex pyramids for France and Greece diverge by <0.05 Kullback-Leibler—a difference smaller than typical sampling error in official micro-censuses.


What can you do with it?

  • Train risk-scoring AIs without GDPR headaches.
  • Stress-test border-screening rules under synthetic epidemics or mass-events.
  • Model group-based threats (e.g., small cells booking separate flights).
  • Share full datasets with external collaborators—no data-use agreements needed.

Limits & next hops

  • Group travel is still too rare; social-network density will be dialed up.
  • Seasonal travel dips in late-summer are weaker than reality.
  • Expert usability review with border police is under way.
  • Code already on GitHub; larger 20-million-passenger run scheduled for late-2025.

Take-away

SynPNR reproduces the statistical soul of air-travel data while surgically removing privacy marrow. For researchers boxed out of real PNR vaults, fake flights may finally unlock real insights.


Try it yourself

Code & sample data: github.com/fafadlian/Synthetic-PNR-Generation
Citation: Fadlian et al., “Generating Realistic Passenger Name Records with Privacy Compliance for Security Analysis”, ISCRAM 2025.

Making Air Travel Security Smarter: New Insights from Europe’s Passenger Data Study

Every year, billions of people board flights. Each reservation leaves a digital footprint—names, travel dates, seat numbers, payment details—collectively called a Passenger Name Record (PNR). After the 9/11 attacks, security services worldwide realised that, if pooled and analysed, these records could reveal hidden plots before anyone reaches the airport. The European Union therefore passed the PNR Directive in 2016, asking every member state to set up a special office—a Passenger Information Unit (PIU)—to gather, check and share these clues.

In the context of TENACITy, between December 2023 and February 2024 researchers circulated a confidential questionnaire to PIU analysts, technicians and directors across 14 countries. Their frank answers sketch a picture of promise, but also of potholes that slow the road to safer skies.


1. The Good News First

  • PIUs are up and running. All units that replied already collect PNR data and most can perform basic computer searches.
  • Staff overwhelmingly believe the data do help: hidden links between suspects, suspicious money trails and even drug routes have been spotted sooner than old-fashioned police work would have managed.
  • International cooperation is alive: PIUs swap tips with sister units, Europol and, in some cases, non-EU partners.

2. Six Headaches Revealed by the Survey

2.1 Data Transmission – “Same Language, Different Accents”

Airlines are free to pick how they send spreadsheets or XML messages. Some still email Excel files. Result: formats differ, fields swap places, and one in five messages arrives unreadable without manual tidying.

2.2 Data Collection – “Plenty of Noise, Not Always Signal”

Budget tickets bought through travel agencies often lack phone numbers or correct spelling. Because carriers are not obliged to double-check typing, analysts later waste hours guessing whether “J. Smiht” is simply a typo or a deliberate smokescreen.

2.3 Data Quality – “Missing Puzzle Pieces”

Optional fields (date of birth, payment method, baggage details) are frequently blank. Missing birthday information is especially frustrating; without age, risk-scoring algorithms misfire and toddlers can be flagged next to terror suspects.

2.4 Data Analysis – “Broken Travels, Blind Spots”

Europe currently collects air data only. Clever offenders dodge detection by flying into Istanbul, then continuing overland by bus or ferry. PIUs call this “broken travel” and warn they are blind to the second leg.

2.5 Staff Matters – “Too Much Haystack, Too Few Hands”

Most units operate with fewer than twenty specialists. Manually eyeballing every record is impossible, yet many PIUs lack funds for data-science training or modern visualisation tools.

2.6 Legal Matters – “Navigating a Maze”

A 2022 EU court ruling tightened the screws: data may be stored longer than six months only if a concrete security threat is documented. Staff now juggle shorter retention windows, extra paperwork and the need to show “human eyes” reviewed every automated alert.


3. Practical Fixes Proposed by the People in the Trenches

  1. Mandatory fields: Make date of birth, citizenship and contact details compulsory at the booking screen; a simple pop-up could block incomplete orders.
  2. Automatic spell-check on entry: Airlines already verify credit-card numbers—why not names and passport digits?
  3. Shared templates: One common Europol form, with drop-down menus, for state-to-state queries would cut email ping-pong.
  4. Robot helpers: Small “quality bots” that highlight misspellings or impossible travel itineraries the moment files land, freeing analysts for real detective work.
  5. Smarter rules engines: Software that learns from past hits and suggests new risk patterns, always keeping a human in the final loop.
  6. Multi-modal vision: Explore extending PNR-style collection to ferries, international trains and long-distance coaches, but only where necessity is proven and privacy safeguards travel alongside.

4. Why This Matters Beyond Airports

The study is a reality check for “big data” security more generally. Simply pouring oceans of information into computers does not automatically produce answers. Standard formats, data hygiene and well-trained humans remain the secret sauce. Policymakers worldwide – whether crafting health, climate or cyber-risk databases – can draw the same lesson: invest first in quality and cooperation, then in fancy algorithms.

A Blockchain Safety Net for Passenger Data

1. The privacy-versus-security headache

Every year Europe’s Passenger Information Units (PIUs) collect hundreds of millions of booking records—names, seats, bag weights, payment slips—known as Passenger Name Records (PNRs).
The data are vital for spotting drug mules, trafficking routes or terror suspects, yet they are also deeply personal. EU law therefore demands:

  • mask identities after 6 months
  • delete everything after 5 years
  • log every single access
  • share only with vetted partners

Traditional file exchanges or e-mail satisfy none of these points: logs can be edited, copies leak, and nobody can prove later who touched which record when.
Enter blockchain—a technology famous for cryptocurrencies but far better at tamper-proof audit trails.

2. Why a simple database is not enough

A central warehouse (think “PNR cloud”) looks convenient, but is subject to risks:

RiskExample
Insider threatRogue analyst downloads entire set
TamperingSomeone re-opens a closed case
RepudiationOfficer denies requesting the data
AvailabilitySingle hack takes all systems down

What security engineers need is decentralisation, cryptographic logging and fine-grained access control without building separate national silos.

3. Blockchain 101 for aviation intelligence

Key ideas in blockchain:

  • Ledger = append-only spreadsheet copied to every approved node
  • Each row = hash of content + previous row → change one letter, break the chain
  • Smart contracts = small programs that fire only when pre-defined rules are met
  • Permissioned network = only accredited PIU nodes may join; no anonymous miners

4. How TENACITy wired a blockchain into real border policing

TENACITy is building a modular toolbox that helps PIUs fuse PNR with open-source intel, risk-scoring AI and pattern-mining engines. Its blockchain communication module, developed by Brno University of Technology, handles the most sensitive step: unit-to-unit data exchange.

4.1 Design choices

RequirementTechnical answer
Keep data EU-onlyHyperledger Fabric permissioned network
Hide payload from third partiesPrivate-data collections + off-chain encrypted blob storage
Prove nothing was alteredStore only hashes on-chain
Automatic deletionChaincode flags blobs for purge after legal deadline
No gas feesFabric’s Raft consensus, no Proof-of-Work

4.2 Life of a typical request

  1. Officer in PIU-A opens TENACITy dashboard → hits “Request legs for passenger XYZ”
  2. API server writes a hashed request onto the channel; only PIU-B can decrypt the blob
  3. PIU-B chaincode checks endorsement policy, attaches encrypted PNR, timestamps response
  4. Both sides confirm receipt; payload hashes remain, blobs are cryptographically erased
  5. Any auditor can later verify: “Yes, hash ABC was written by PIU-A at 09:14:07 and answered by PIU-B at 09:18:22”—without ever seeing the raw PNR

5. Early numbers & lessons

  • Pilot network spans 6 EU PIUs, 12 peer nodes, <1 s latency inside Europe
  • Throughput tested at 2 000 request/response pairs per hour—enough for daily peaks
  • Over 80 % of test users rated the blockchain log “more trustworthy” than classical e-mail exchange
  • Legal team likes built-in deletion: no forgotten files lingering on FTP servers

6. Take-away

Blockchain is not magic, but tamper-proof logging + programmable deletion + mutual distrust is exactly what PNR law demands. Within TENACITy the ledger is no cryptocurrency – it is a digital notary that lets border analysts share vital clues while proving, beyond doubt, that citizens’ rights were respected every step of the way.

This project has received funding from the European Union's Horizon Europe research and innovation programme under grant agreement No 101074048