Software and Systems Visualization Methods and Tools

 24.05.2025, 24.05.2025 -  Old-David-Wang

Abstract

Diagrams are essential tools for understanding, communicating, and analyzing complex systems within software and business domains. This document comprehensively leverages existing materials to outline different types of diagrams, including flowcharts, data flow diagrams, business process models, software architecture diagrams, entity-relationship diagrams, and UML diagrams. The article further delves into specific diagramming and analysis techniques used in software engineering, such as control flow graphs, data flow analysis, control dependence graphs, and program dependence graphs, highlighting their applications in areas like program slicing and alias analysis. The discussion in this article is entirely based on the provided source material, aiming to elucidate the characteristics, components, and uses of these diverse representation methods.

Tool List

List of Diagram/Model Type Tools (Including Open-Source Status)

Below is a list of various diagram types and tools.

1. Business Process Models
Tool Name Open-Source Status (Based on above text)
BPMN (Method/Notation) Not Applicable (Notation)
Open ModelSphere Yes
Software Ideas Modeler No
PowerDesigner No
UModel No
Modelio (BPMN2) Yes
2. Data Flow Diagrams (DFD)
Tool Name Open-Source Status (Based on above text)
Astah Unclear
Software Ideas Modeler No
Dia Yes
Diagrams.net (draw.io) Yes
UML Activity Diagram (Can substitute its function)
Tool Name Open-Source Status (Based on above text)
(No specific tool) Not Applicable (Diagram Type)
Flowcharts
Tool Name Open-Source Status (Based on above text)
Astah Unclear
Software Ideas Modeler No
Diagrams.net (draw.io) Yes
mermaid Yes
excalidraw Yes
Inkscape Yes
PowerPoint No
Matplotlib Yes (Python Library)
Adobe Illustrator No
Dia Yes
OmniGraffle No
yEd No
TikZ Yes (LaTeX Package)
Drakon-chart Unspecified
Software Architecture Diagrams
Notes Open-Source Status (Based on above text)
No specific tools listed Please refer to related diagram types and general diagramming tools below
Entity Relationship Diagrams (ERD)
Tool Name Open-Source Status (Based on above text)
Lucidchart No
Astah Unclear
Open ModelSphere Yes
Software Ideas Modeler No
ERROL (Query Language) Not Applicable (Language)
Unified Modeling Language (UML) Diagrams
Tool Name Open-Source Status (Based on above text)
ArgoUML Yes (Archived)
Astah Unclear
ATL Yes (Eclipse M2M Project)
Together No
BOUML Unclear
Cacoo Unclear
clang-uml Yes (Inferred from GitHub link)
Dia Yes
Diagrams.net Yes
Eclipse UML2 Tools Yes (Eclipse Project)
Enterprise Architect No
Gliffy Unclear
JetUML Yes (Inferred from GitHub link)
Lucidchart No
MagicDraw No
Microsoft Visio No
Modelio Yes
MyEclipse Unclear
NClass Yes (Inferred from GitHub link)
NetBeans Yes (Part of NetBeans IDE)
Open ModelSphere Yes
Papyrus Yes (Eclipse Project)
PlantUML Yes
PowerDesigner No
PragmaDev Studio No
Prosa UML Modeller Unclear
Rational Rose XDE No
Rational Software Architect No
Rational Software Modeler No
Rational System Architect No
Reactive Blocks Yes (Eclipse Project)
Rhapsody No
Software Ideas Modeler No
StarUML No
Umbrello UML Modeller Yes (KDE Project)
UML Designer Yes (Eclipse Project)
UMLet Yes
UModel No
UmpleClass Yes (Inferred from GitHub link)
WhiteStarUML Yes (Fork of StarUML)
yEd No
Systems Modeling Language (SysML) Diagrams
Tool Name Open-Source Status (Based on above text)
Enterprise Architect No
Modelio Yes
Rhapsody No
Software Ideas Modeler No
UModel No
General Data Models
Tool Name Open-Source Status (Based on above text)
Open ModelSphere Yes
PowerDesigner No
UModel No
Other Mentioned Diagram Types
Thread Models
Notes Open-Source Status (Based on above text)
No specific tools listed -
Petri Nets
Notes Open-Source Status (Based on above text)
No specific tools listed -
Program Network Charts
Notes Open-Source Status (Based on above text)
No specific tools listed -
System Resources Charts
Notes Open-Source Status (Based on above text)
No specific tools listed -
General Diagramming Tools
Tool Name Open-Source Status (Based on above text)
diagrams.net (draw.io) Yes
mermaid Yes
excalidraw Yes
Inkscape Yes
PowerPoint No
Matplotlib Yes (Python Library)
Adobe Illustrator No
Dia Yes
OmniGraffle No
yEd No
TikZ Yes (LaTeX Package)
Excel No

Article

1. Introduction

Visual representations, or diagrams, are crucial for abstracting complex systems into more comprehensible forms, facilitating communication among various stakeholders, from business leaders to software engineers. Different diagrams serve different purposes, focusing on various aspects of a system such as workflows, data flows, structure, or control flow. Based on the provided material, this article explores a range of diagrams and analysis techniques, elucidating their unique characteristics and applications.


2. Diagram Types for Business and System Modeling

In both business and technical contexts, various diagram types are used to model processes, data, and system structures.

2.1. Flowcharts

A flowchart is a diagrammatic representation of a workflow or process. It can also be defined as a diagrammatic representation of an algorithm, outlining a step-by-step approach to solving a task. Flowcharts use different types of boxes to represent steps and connecting arrows to show their sequence. This diagrammatic representation helps illustrate a solution model for a given problem and is used to analyze, design, document, or manage a process or program.

Standard symbols for flowcharts were developed by the American National Standards Institute (ANSI) in the 1960s and adopted by the International Organization for Standardization (ISO) in 1970, with the current standard being ISO 5807:1986.

  • Flowcharts typically flow from top to bottom and from left to right.
  • Key symbols include flowlines (lines with arrows) that represent the order of operations.
  • Terminal symbols (stadium-shaped, oval, or rounded rectangles) indicate the start and end of a program or sub-process, often containing words like “Start” or “End.”
  • A process is represented by a rectangle, signifying a set of operations that change data values, form, or location.
  • A decision is a diamond, indicating a conditional operation that determines one of two paths the program will take, typically a Yes/No question.
  • Input/Output is represented by a parallelogram, indicating data input and output.
  • Annotations (open rectangles with dashed or solid lines connecting to the corresponding symbol) provide additional information.
  • A predefined process (rectangle with double vertical borders) represents a named process defined elsewhere.
  • On-page connectors and off-page connectors (small circles and home-plate shaped pentagons) are used to substitute for long lines or connect to other pages.
  • Other symbols exist for data files/databases (cylindrical), documents, manual operations, manual inputs, and preparation/initialization.
  • Flowcharts can also use horizontal lines or bar graphs to denote parallel processing, indicating the start or end of simultaneous operations.

Flowcharts generally focus on a certain type of control flow. They often complement other types of diagrams. Kaoru Ishikawa identified flowcharts as one of the seven basic tools of quality control. In UML, an activity diagram is a type of flowchart. Other names for flowcharts include process charts, flow diagrams, functional flow diagrams, process maps, process charts, functional process charts, business process models, process models, process flow diagrams, workflow diagrams, and business flow diagrams. Various flowchart classifications exist, such as system flowcharts, program flowcharts, decision flowcharts, logic flowcharts, system flowcharts, product flowcharts, and process flowcharts.


2.2. Data Flow Diagrams (DFD)

A Data Flow Diagram (DFD) is a tool in structured analysis, data modeling, and threat modeling. It originated from Structured Analysis and Design Technique (SADT) in the mid-1970s and was popularized by Tom DeMarco and Edward Yourdon, among others. In contrast to business process models, which can contain a great deal of detail, DFDs may focus on the movement of data, simplifying process depiction to steps like “an actor inputs data; an application accepts data; an actor views data.”

DFDs are composed of four main components: processes, flows, stores, and terminators.

  • A process (function, transformation) is the part of the system that transforms inputs into outputs. It’s usually represented by a circle, oval, rectangle, or rounded rectangle and named to express its essence.
  • A data flow (flow, data stream) uses an arrow to represent the transmission of information (and sometimes physical material). It ideally transmits only one type of information. Flows connect processes, stores, and terminators.
  • A store (data storage, data store, file, database) is used to store data for later use. It’s often represented by two horizontal lines, and its name is typically a plural noun. A store can represent data files, document folders, or filing cabinets, and its representation is implementation-independent. Flowing into a store usually indicates data input/update, while flowing out from a store usually indicates reading.
  • A terminator is an external entity that communicates with the system but is located outside it. It can be an organization, a group of people, an agency, a department, or another system.

Rules for creating DFDs include making entity names easy to understand, generic yet specific. Processes should be numbered for easy reference. For clarity, DFDs are advised to contain between 6 and 9 processes, with a minimum of 3, except for the context diagram, which represents the entire system as a single process interacting with all terminators. A process can be refined into a more detailed representation using another DFD. DFDs can be seen as an inverted Petri Net. In UML, an activity diagram often takes over the role of a DFD.


2.3. Business Process Models (BPM)

Business Process Models (BPMs) describe operational business processes and can contain a great deal of detail. Unlike Data Flow Diagrams which focus on the flow of data, BPMs aim to depict the steps and flow of business processes. One technique to validate if an activity is a step is to use a noun-verb name (e.g., “validate license”) and check if reversing it (“license validated”) still makes sense and confirms a transformation. Several modeling approaches exist, such as BPMN. To ensure clear understanding by stakeholders, simple notations like swimlane diagrams are used.


2.4. Software Architecture Diagrams

What is a Software Architecture Diagram? It’s a visual representation of a system’s structure, like a blueprint, illustrating its components, their interactions, and relationships. Software architecture diagrams are powerful tools for simplifying complex systems and improving communication among different stakeholders, including software engineers, CEOs, and CIOs. By breaking down intricate structures into easily understandable visuals, they help coordinate technical teams and business stakeholders.

Creating a software architecture diagram involves listing the main system components, such as servers, databases, microservices, and APIs, ensuring no critical elements are missed. Then, define how these components interact, considering data flow, dependencies, and external systems. The level of abstraction should align with the diagram’s purpose:

  • Conceptual Level: Highlights key system elements and their generalized relationships, suitable for business stakeholders.
  • Logical Level: Focuses on functions and connections between components, useful for architects.
  • Physical Level: Details hardware, network configurations, and deployment specifics, aimed at engineers.

Purpose and Importance: Software architecture diagrams are incredibly useful for simplifying complex systems. They act as a bridge between technical teams and business stakeholders (like CEOs and CIOs), helping everyone understand complex systems regardless of their technical background. They enhance collaboration, support decision-making, reduce errors, and improve project efficiency.


2.5. Entity Relationship Diagrams (ERD)

Entity Relationship Diagrams (ERDs) are used to visualize relationships between entities, commonly in the context of database design. Key concepts include:

  • Entity type: A category of information or objects that can be stored, e.g., “Student.”
  • Entity set: All entities of a specific type at a particular point in time, e.g., “Students attending class today.” An instance is a concrete person or car within the set.
  • Entity categories: Entities are categorized as strong entities (defined solely by their own attributes), weak entities (cannot be defined solely by their own attributes), or associative entities (associate entities within entity sets).
  • Entity keys: Attributes that uniquely identify entities. Types include superkey (one or more attributes defining an entity), candidate key (a minimal superkey), primary key (the chosen candidate key), and foreign key (identifies relationships between entities).
  • Relationship: Describes how entities interact or are associated, often viewed as a verb (e.g., a student enrolls in a course). Relationships are typically represented by diamonds or labels on connecting lines. A recursive relationship is when the same entity participates in a relationship multiple times.
  • Attributes: Characteristics that describe entities (like adjectives, e.g., “a sophomore student”) or relationships (like adverbs, e.g., “digitally”).

Various ERD representation styles exist, including Chen, Crow’s Foot/Martin/Information Engineering, Bachman, IDEF1X, and Barker. ER models and data models can be drawn at different levels of detail: conceptual (highest level, least detail, showing overall scope), logical (more detail, defining operational/transaction entities, technology-independent), and physical (specific technical details for database implementation). These levels are similar to those used in other diagram types like DFDs but differ from the three-schema approach in software engineering.


2.6. Unified Modeling Language (UML) Diagrams

The Unified Modeling Language (UML) is a standard conceptual modeling notation used in software development. It encompasses many different types of diagrams. An activity diagram in UML is a type of flowchart. UML diagrams can be used for technical documentation. A use case diagram is a type of UML diagram that helps clarify requirements, detect problems, and simplify system design. UML diagrams are categorized into structure diagrams (class diagram, component diagram, composite structure diagram, deployment diagram, object diagram, package diagram, profile diagram), behavior diagrams (activity diagram, state machine diagram, use case diagram), and interaction diagrams (communication diagram, sequence diagram, interaction overview diagram, timing diagram). Many software tools exist that support the creation of UML diagrams. SysML is mentioned as a related but distinct modeling language.


3. Diagrams and Techniques for Software Analysis

Beyond general system modeling, specific graphical representations and analysis techniques are employed in software engineering to understand and analyze program code.

3.1. Control Flow Graphs (CFG)

A Control Flow Graph (CFG) is a directed graph where each node represents a basic block, and each edge represents a control flow between basic blocks. A basic block is a sequence of consecutive statements where control flow enters at the beginning and leaves at the end, with no branches except at the end. Basic blocks are constructed by identifying “leaders” (the first statement in a program, the target of a goto statement, or the statement immediately following a goto statement). Each basic block begins with a leader and includes subsequent statements until the next leader or the program’s end. While a basic block can be the largest sequence of statements, in source code analysis, each statement is often considered a basic block. Edges represent control flow, with conditional transfers (like while, if-else, switch) labeled “T” or “F”. Nodes can have successors and predecessors based on edges. For clarity, “join” nodes can be inserted where control flow merges. Switch statements can be represented by a single node with multiple labeled outgoing edges, one for each case value. Non-executable structural statements (like end or else) are generally not included as nodes.


3.2. Data Flow Analysis

Data Flow Analysis categorizes variable references as either definitions (when a variable receives a value) or uses (when a variable’s value is retrieved). Uses are further divided into computation uses (c-uses), which affect computations or outputs, and predicate uses (p-uses), which affect control flow.

A key concept is reaching definitions: definitions that might reach a specific program point along some path and are not “killed” by other definitions of the same variable along that path. Data flow analysis algorithms compute the following sets for each program node:

  • GEN set: Definitions generated within the node.
  • KILL set: Definitions killed if they reach the node’s entry.
  • IN set: Definitions reaching the point just before the node. It’s the union of the OUT sets of its immediate predecessors.
  • OUT set: Definitions reaching the point just after the node. It includes definitions generated in the node or those that reached the entry and were not killed within the node ($GEN \cup (IN - KILL)$).

The algorithm ReachingDefs iteratively computes IN and OUT sets until stability, essentially simulating all possible program executions to propagate definitions as far as possible without being killed. This analysis can be used to solve other problems like reachable uses, available expressions, and live variables.


3.3. Definition-Use Pairs and Data Dependence Graphs

A definition-use pair (DU pair) for a variable $v$ is an ordered pair $(D, U)$ where statement $D$ defines $v$, statement $U$ uses $v$, and there exists a path from $D$ to $U$ in the CFG that does not redefine $v$ (a def-clear path). These pairs represent data interactions where a computation at $U$ depends on data computed at $D$, which is called flow dependence. An upwards exposed use within a node refers to a use that can be reached by a definition arriving at the node’s entry.

Data Dependence Graphs (DDGs) can be used to graphically represent data dependencies, where nodes represent definition/use locations and edges represent dependencies. Alternatively, data dependence edges can be added within the CFG; for each DU pair $(D, U)$, an edge $(D, U)$ is added.


3.4. Program Paths

A path in a CFG is a sequence of nodes connected by edges. A complete path is a path from the entry node to the exit node. An infeasible path is a path that cannot be executed by any program input due to conditional statements. A du-path is a path connecting a definition node and a use node (for a variable) that is def-clear for that variable.


3.5. Control Dependence Graphs (CDG)

A Control Dependence Graph (CDG) represents control dependencies, where nodes represent statements or code regions with a common control dependence. CDG construction can involve augmenting the CFG with start/end nodes and building a post-dominator tree. A CDG includes statement nodes (circular) and predicate nodes (rounded boxes), with labeled edges emanating from predicate nodes. Region nodes (hexagonal) can be added to summarize control dependencies for statements within a region.


3.6. Program Dependence Graphs (PDG)

A Program Dependence Graph (PDG) combines both control dependence and data dependence information. It can be seen as a CFG augmented with data dependence edges. PDG nodes correspond to program statements, predicates, or regions. Edges represent control dependencies (labeled) and data dependencies.


3.7. Program Slicing

Program slicing, originally used for debugging, extracts statements and predicates that might affect the value of a variable at a specific program point. Slices can be computed using a PDG by performing a backward traversal from the point of interest, including all nodes reachable through control and data dependence predecessors. A dynamic slice considers a specific execution path and restricts the static slice to the traversed nodes. A forward slice identifies statements and predicates affected by a variable’s value at a point, computed through the transitive closure of direct dependencies.


3.8. Alias Analysis

Alias analysis identifies situations where different names (like variables or dereferenced pointers) refer to the same memory location at a given program point. This is particularly important in languages that include pointers. Results are often expressed using points-to sets, indicating which fixed-location variables (non-pointer variables) a pointer might refer to. Alias analysis helps uncover side effects introduced by indirect memory modifications. Points-to information is typically MAY information, meaning it might be true during execution, depending on the execution path.

Traditional Definition-Use Analysis (DUA) is complicated by pointers. To address this, variable references are classified into four types: DDEF (definite definition), PDEF (possible definition via a pointer), DUSE (definite use), and PUSE (possible use via a pointer). Data flow algorithms are adjusted to consider definite and possible definitions/uses when identifying DU pairs, but only definite definitions are considered to kill other definitions. A dereferenced pointer ($*p$) can be considered a definite definition of $*p$ (and a possible definition of its aliases) and a definite use of the pointer variable $p$ itself.


4. Diagramming Tools

A variety of software tools are available for creating these diagrams. The material mentions tools such as diagrams.net (app.diagrams.net), TikZ (for LaTeX), Mermaid, Excalidraw, Ipe, Inkscape, Dia, Omnigraffle, PowerPoint, Matplotlib, Adobe Illustrator, Figma, PlantUML, Lucidchart, Visio, and Miro. Many of these tools support the creation of multiple types of diagrams, including those discussed in this article.


5. Conclusion

The material showcases the diverse range of diagrams used in business and software development. From high-level business process models and software architecture views to detailed program control flow graphs and dependency analyses, each diagram type offers a unique perspective and serves a specific analytical or communication goal. Understanding these different diagram types and their associated techniques is crucial for effective system design, development, and analysis.