Data flow analysis is an old, but valuable technique that helps to understand and specify a process by performing functional decomposition. It’s especially useful when the overall function of the system is well-defined, but there’s still not enough information to drive detailed design of modules. I try to show the process of data flow analysis by considering a real application that I have designed.
Data Flow Diagrams
The result of data flow analysis is a set of data-flow diagrams.
The purpose of a data flow diagram is to show transformations of data as it flows through the system. Each transformation is represented by a named process. The direction of data flow between processes is specified as well.
Here is a generic data flow diagram:
- Arrows are data flows. Each data flow has a data type and a direction.
- Ellipses are processes. Each process has at least one incoming and at least one outgoing data flow. Each process transforms data in some way.
- Rectangles are files. A file represents a data store and has a data type, similarly to a data flow. Processes can read to and write from files. A file doesn’t imply any specific kind of persistence mechanism.
- The dashed shape is the system’s boundary. Everything inside the boundary is what we are going to design and implement. Everything outside the boundary is external context.
Processes in data flow diagrams do not correspond to system components or modules. They represent abstract functional units, each responsible for a specific data transformation. By dealing with processes we avoid making design decisions prematurely. Instead we focus on steps that must be performed by the system to satisfy it’s overall function. During design, processes identified by analysis can be mapped to components/modules.
Note that a context diagram is a special case of a data flow diagram. In a context diagram we specify data flows crossing the system boundary and ignore what’s happening inside the system.
I will use a simplified version of a real application. It is a web API proxy for a serial communications protocol. The purpose of the tool is to simplify integration of GUI applications with device networks. This is achieved by hiding the low-level protocol behind a resource-oriented API.
The context diagram below indicates the system’s function - to convert
API resources passed by clients into
protocol frames, which are then sent to the network of connected devices.
It works the same way in the opposite direction.
The way resources are mapped to network+device addresses is determined by a
resource mapping, provided by the system’s administrator.
For simplicity I will limit the system functionality to three most important use cases:
- read a resource
- write a resource
- configure a resource
Starting at System Boundaries
From the context diagram above we can tell that the system’s responsibility at high level is transforming data from one into another. Now it’s time to create a more granular decomposition of the problem by specifying how this transformation can be performed.
But before we can specify intermediate data structures, we need to specify the initial input/output data.
Here is the definition of
It includes an id field and a set of properties.
Each property is a name-value pair.
protocol_frame type describes a specific request to read/update data in the specified device’s memory register.
api_resource can correspond to one or more
Analyzing the Write Use Case
We already know that when a client passes an instance of an
api_resource, the system must convert it to one or more
We can start the analysis either from the beginning or the end of the data transformation chain.
Step 1 - Mapping to Registers
When a resource arrives, the system must determine which register address each property corresponds to.
The output of this is a list of
register_value represents a location the system has to update in order to satisfy the incoming request.
Lets call this process
map to registers.
To perform the transformation, the process needs additional data - the mapping configuration. It’s called resource to register map on the diagram. The map is shown as a file - static data stored within the system.
Step 2 - Creating a Write Request
At this point the data looks closer to what we need.
But it’s still just a list of individual registers to update.
We know that the protocol allows to write multiple registers at once, so the system needs to combine register values into larger units of work.
Let’s call them
To completely represent a request, an additional protocol-specific metadata item is necessary -
operation_type. The system will determine it’s value and add to the object.
We end up with this model of
Now we can put the process of transforming
write_requests on the diagram.
Step 3 - Serialization
What is the difference between a
write_request and a
protocol_frame? The data structure is the same, as you can see in class definitions above.
The difference is the data format.
To send a
write_request via a serial port, the system must transform it into a byte array.
We don’t need to change the structure of data anymore, only it’s representation.
We call this process Serialize.
How to Name Processes
Assign names to processes according to how they modify the data and only after both input and output data types are specified. It’s hard to name a process properly before knowing it’s inputs and outputs. There’s a high chance the name will not clearly describe it’s purpose.
How to Define Data Types
To come up with good data structures for a data flow diagram we need to map them to concepts from the problem domain. This is why it’s considered an analysis technique, not a design technique. The point of data flow analysis is to understand the problem well, which would ideally lead to the most natural solution.
So if possible, try to not make up any artificial data types. For example, a
register value and a
write request both make sense in the domain. They are natural and stable parts of the problem, which will be a good foundation for design.
Analyzing the Read Use Case
Reading is performed similarly, but the data flows in the opposite direction.
read_responses, then extract
register_values and finally map the data to
read_response is obviously different from
read_request we defined earlier, but I’ll skip that.
Analyzing the Configuration Use Case
The mapping of
api_resources to and from
registers_values, which we did in the previous use cases was determined by configuration, provided from the outside.
This use case is the most trivial - it doesn’t involve any changes to the data structure. The system only stores the mapping data for later use within it’s boundary.
The data type of
resource_mapping looks like this:
So far we have performed data flow analysis per use case. Now lets combine the results into a single diagram to see how everything works together.
One of the nice properties of data flow analysis is that it can be done recursively. So we could take any process, such as map to resource and further decompose it into even more granular pieces.
Feel free to leave your questions, comments or suggestions below. I will get back. Subscribe for more.