How to Build a Data Flow Diagram (DFD) for Threat Modeling

In the previous post we introduced threat modeling and the STRIDE framework. STRIDE tells you what to look for. A Data Flow Diagram (DFD) tells you where to look. Together they are the most widely used combination in practical threat modeling — and for good reason.

A DFD is a map of your system. It shows how data moves between components, which components process it, where it is stored, and — critically — where trust changes hands. Once you have that map, applying STRIDE is straightforward: you walk each element and ask "how could an attacker abuse this?"

The five elements of a DFD

A threat-modeling DFD uses five building blocks. You only need these five, and keeping them distinct is what gives the diagram its analytical power.

Element	Symbol	What it represents	Examples
External Entity	[ rectangle ]	A person or system outside your control that sends or receives data	End user, mobile app, third-party API, payment gateway
Process	( circle )	A component you own that transforms or acts on data	Web server, API service, background worker, Lambda function
Data Store	= parallel lines =	Anywhere data is persisted at rest	SQL database, Redis cache, S3 bucket, message queue, log file
Data Flow	——> arrow	Data moving between elements; labelled with what is being transferred	HTTP request, SQL query, file read, webhook payload
Trust Boundary	- - - dashed box - - -	A line across which trust changes — your most important threat surface	Internet/internal network edge, user/admin privilege boundary, cloud account boundary

Rule of thumb: every data flow that crosses a trust boundary is a candidate for at least one STRIDE threat. Start there.

Why DFDs are the foundation of STRIDE

STRIDE threat categories map directly onto DFD element types. This is not a coincidence — STRIDE was designed alongside DFDs at Microsoft precisely because the pairing is so productive:

External Entities are the primary source of Spoofing threats — can the system verify who is on the other end?
Data Flows are vulnerable to Tampering and Information Disclosure — can data be modified or intercepted in transit?
Processes can fail on Repudiation (no audit trail), Denial of Service (resource exhaustion), and Elevation of Privilege (insufficient authorisation checks).
Data Stores are the primary target for Information Disclosure and Tampering at rest.
Trust Boundaries are where almost every category is relevant — they mark the points where the system must make, and enforce, a trust decision.

Once you have a DFD, you can mechanically walk every element with the STRIDE checklist. It becomes a structured search, not a creative brainstorm.

Step-by-step: drawing your first DFD

Let us build a DFD for a simple but realistic system: a web application with user authentication, a REST API, a database, and an external email provider. This covers the majority of SaaS applications.

1
Identify the external entities

Who or what interacts with your system from outside your control? In our example: the User (via a browser) and the Email Provider (e.g. SendGrid or Mailgun). These sit at the edges of the diagram — they are not yours to secure, but every data flow to or from them is.
2
Identify your processes

What components do you own and operate that transform or act on data? Here: the Web / API Server handling HTTP requests and the Background Worker sending emails. Each process is a trust boundary candidate — who can call it, and with what authority?
3
Identify your data stores

Where does data live at rest? Here: the User Database (accounts and sessions) and the Email Queue (pending outbound messages). Data stores have their own threat surface — access controls, encryption at rest, and audit logging all belong here.
4
Draw the data flows

Connect elements with labelled arrows showing exactly what data moves and in which direction. Be specific: "username + bcrypt hash" is more useful than "credentials", and "JWT in Authorization header" is more useful than "token". The specificity is what makes STRIDE analysis concrete.
5
Draw the trust boundaries

Where does trust change? The internet-to-server edge is obvious. Also mark: the boundary between your web server and the database (shared credentials?), and the boundary between your infrastructure and the external email provider. Every flow crossing a boundary is a mandatory threat-modeling stop.

Applying these five steps to the example gives the following diagram. Notice how the trust boundary makes it immediately clear which data flows are the highest-risk analysis points.

External entity Outside your control Process Component you own Data store Data at rest Data flow Labelled arrow Trust boundary Where trust changes

With this diagram in hand, you can now apply STRIDE systematically to every element and every flow that crosses a boundary.

Trust boundaries: a deeper look

Trust boundaries are where threat modeling earns its keep. A trust boundary exists anywhere two things interact with different levels of privilege, accountability, or control. They are not always obvious — here are the most common ones teams miss:

Network boundaries

The line between the public internet and your private network is the most obvious trust boundary. Any data flow crossing it — inbound requests, outbound webhooks, API calls to third-party services — must be authenticated, authorised, and encrypted.

Process privilege boundaries

If one process runs as root and another runs as an unprivileged user, there is a trust boundary between them. If a web server calls a background process with elevated database permissions, data flowing across that call crosses a trust boundary. An attacker who compromises the web server can exploit that elevated permission if it is not carefully controlled.

Cloud account and VPC boundaries

In cloud environments, trust boundaries map to IAM role boundaries, VPC peering connections, and account-level separations. A Lambda function calling an S3 bucket in a different AWS account crosses a trust boundary. So does a microservice calling another microservice across a VPC peering link — even if both are "internal".

User vs. admin privilege

Any API route that behaves differently for an admin user than for a regular user implies a privilege boundary. Mark it on your DFD. The classic Elevation of Privilege threat lives exactly here: can a regular user reach an admin-only code path?

Practical tip: when in doubt about whether something is a trust boundary, ask: "if element A were completely compromised by an attacker, what could they do to element B?" If the answer is "a lot", you have a trust boundary that needs a security control.

Common DFD mistakes to avoid

Over-complicating the diagram

A DFD is not an architecture diagram. You do not need to show every microservice, load balancer, CDN node, and health-check endpoint. Draw at the level of granularity where trust decisions are made. If two services share the same database credentials and run in the same network segment, they can often be a single process node for threat-modeling purposes. You can always decompose further in a follow-up DFD scoped to a specific component.

Missing trust boundaries

This is the most common and most costly mistake. Teams draw the data flows accurately but forget to mark trust boundaries, then wonder why their threat list feels thin. Every flow that crosses a trust boundary is a threat surface. Every trust boundary that is missing from the diagram is a blind spot.

Treating data stores as passive

Data stores are not just arrows' endpoints. They have their own threat profile: who can read them? Who can write to them? Are they encrypted at rest? Is access logged? A database that multiple processes can write to without row-level access control is a significant Tampering and Information Disclosure risk, regardless of how well the API in front of it is secured.

Vague data flow labels

A data flow labelled "data" or "request" tells you nothing useful. Label flows with the actual content: "username + bcrypt hash", "JWT in Authorization header", "S3 pre-signed URL". The specificity forces you to think about what is actually being transmitted and makes STRIDE analysis far more targeted.

Drawing the happy path only

DFDs typically show the successful, intended flow. Threat modeling requires you to also think about error paths, administrative interfaces, monitoring and logging pipelines, and batch jobs. An admin backdoor that is not on the DFD is a trust boundary that is not being analyzed.

How ThreatTree makes this easier

Drawing and maintaining DFDs by hand — in Visio, draw.io, or on a whiteboard — works, but it breaks down quickly when the system changes. ThreatTree's DFD editor is built specifically for threat modeling:

Drag-and-drop elements — processes, data stores, external entities, and trust boundaries are first-class objects with semantic meaning, not just shapes.
STRIDE prompts per element — as you add elements and flows, ThreatTree surfaces relevant STRIDE questions, so you are prompted to think about threats as you draw.
Auto-generated risk register — threats you identify during the DFD session are captured directly into a risk register with severity, likelihood, and mitigation fields.
Cloud provider theming — AWS, Azure, and GCP icon sets are built in, so your cloud architecture reads naturally to the people in the room.
Linked forests — a DFD can live alongside Attack Trees in the same Forest, letting you navigate from a high-level threat scenario all the way down to the specific DFD element it exploits.

In the final post in this series, we close the loop: how to take your completed DFD and Attack Trees and turn them into an actionable, shareable risk register — and how to keep that risk register alive as your system evolves.