Data flow analysis is an old, but valuable technique that helps to understand and specify a process by performing functional decomposition. It’s especially useful when the overall function of the system is well-defined, but there’s still not enough information to drive detailed design of modules. I try to show the process of data flow analysis by considering a real application that I have designed.

Data Flow Diagrams

The result of data flow analysis is a set of data-flow diagrams.

The purpose of a data flow diagram is to show transformations of data as it flows through the system. Each transformation is represented by a named process. The direction of data flow between processes is specified as well.

Here is a generic data flow diagram: a generic data flow diagram

  • Arrows are data flows. Each data flow has a data type and a direction.
  • Ellipses are processes. Each process has at least one incoming and at least one outgoing data flow. Each process transforms data in some way.
  • Rectangles are files. A file represents a data store and has a data type, similarly to a data flow. Processes can read to and write from files. A file doesn’t imply any specific kind of persistence mechanism.
  • The dashed shape is the system’s boundary. Everything inside the boundary is what we are going to design and implement. Everything outside the boundary is external context.

Processes in data flow diagrams do not correspond to system components or modules. They represent abstract functional units, each responsible for a specific data transformation. By dealing with processes we avoid making design decisions prematurely. Instead we focus on steps that must be performed by the system to satisfy it’s overall function. During design, processes identified by analysis can be mapped to components/modules.

Note that a context diagram is a special case of a data flow diagram. In a context diagram we specify data flows crossing the system boundary and ignore what’s happening inside the system.

Example Problem

I will use a simplified version of a real application. It is a web API proxy for a serial communications protocol. The purpose of the tool is to simplify integration of GUI applications with device networks. This is achieved by hiding the low-level protocol behind a resource-oriented API.

The context diagram below indicates the system’s function - to convert API resources passed by clients into protocol frames, which are then sent to the network of connected devices. It works the same way in the opposite direction. The way resources are mapped to network+device addresses is determined by a resource mapping, provided by the system’s administrator.

context diagram of I/O server

For simplicity I will limit the system functionality to three most important use cases:

  • read a resource
  • write a resource
  • configure a resource

Starting at System Boundaries

From the context diagram above we can tell that the system’s responsibility at high level is transforming data from one into another. Now it’s time to create a more granular decomposition of the problem by specifying how this transformation can be performed.

But before we can specify intermediate data structures, we need to specify the initial input/output data.

Here is the definition of api_resource. It includes an id field and a set of properties. Each property is a name-value pair.

class api_resource {
    id: string;
    properties: property[];
class property {
    name: string;
    value: string;

The protocol_frame type describes a specific request to read/update data in the specified device’s memory register. One api_resource can correspond to one or more protocol_frame.

class protocol_frame {
    device_address: byte;
    register_address: byte;
    operation_type: byte;
    data: byte[];

Analyzing the Write Use Case

We already know that when a client passes an instance of an api_resource, the system must convert it to one or more protocol_frames. We can start the analysis either from the beginning or the end of the data transformation chain.

Step 1 - Mapping to Registers

When a resource arrives, the system must determine which register address each property corresponds to. The output of this is a list of register_value objects. Each register_value represents a location the system has to update in order to satisfy the incoming request.

class register_value {
    device_address: byte;
    register_address: byte;
    data: uint16;

Lets call this process map to registers. To perform the transformation, the process needs additional data - the mapping configuration. It’s called resource to register map on the diagram. The map is shown as a file - static data stored within the system.

step 1 of writing data flow analysis

Step 2 - Creating a Write Request

At this point the data looks closer to what we need. But it’s still just a list of individual registers to update. We know that the protocol allows to write multiple registers at once, so the system needs to combine register values into larger units of work. Let’s call them write_requests.

To completely represent a request, an additional protocol-specific metadata item is necessary - operation_type. The system will determine it’s value and add to the object.

We end up with this model of write_request:

class write_request {
    device_address: byte;
    register_address: byte;
    operation_type: byte;
    data: ushort[];

Now we can put the process of transforming register_values into write_requests on the diagram.

step 2 of writing data flow analysis

Step 3 - Serialization

What is the difference between a write_request and a protocol_frame? The data structure is the same, as you can see in class definitions above. The difference is the data format. To send a write_request via a serial port, the system must transform it into a byte array. We don’t need to change the structure of data anymore, only it’s representation. We call this process Serialize.

data flow diagram for reading use case

How to Name Processes

Assign names to processes according to how they modify the data and only after both input and output data types are specified. It’s hard to name a process properly before knowing it’s inputs and outputs. There’s a high chance the name will not clearly describe it’s purpose.

How to Define Data Types

To come up with good data structures for a data flow diagram we need to map them to concepts from the problem domain. This is why it’s considered an analysis technique, not a design technique. The point of data flow analysis is to understand the problem well, which would ideally lead to the most natural solution.

So if possible, try to not make up any artificial data types. For example, a register value and a write request both make sense in the domain. They are natural and stable parts of the problem, which will be a good foundation for design.

Analyzing the Read Use Case

Reading is performed similarly, but the data flows in the opposite direction.

We deserialize protocol_frames into read_responses, then extract register_values and finally map the data to api_resources.

read_response is obviously different from read_request we defined earlier, but I’ll skip that.

data flow diagram for writing use case

Analyzing the Configuration Use Case

The mapping of api_resources to and from registers_values, which we did in the previous use cases was determined by configuration, provided from the outside.

This use case is the most trivial - it doesn’t involve any changes to the data structure. The system only stores the mapping data for later use within it’s boundary.

The data type of resource_mapping looks like this:

class resource_mapping {
    resource_id: string;
    properties: property_map[]; // a set of mappings for properties


class property_map{
    property_name: string;
    address: string; // let the address just be a string for simplicity


data flow diagram for configuration use case

Combining Diagrams

So far we have performed data flow analysis per use case. Now lets combine the results into a single diagram to see how everything works together.

complete data flow diagram for I/O server

Going Deeper

One of the nice properties of data flow analysis is that it can be done recursively. So we could take any process, such as map to resource and further decompose it into even more granular pieces.