📜 ⬆️ ⬇️

Business processes. Extract BPMN model from document. Part 1

Modern projects on optimization and automation of business processes, as a rule, assume at the initial stage the analysis of large volumes of Customer documents in order to simulate as-is business processes on their basis in a short time. The list of analyzed documents may include regulations, industry standards, interview protocols, regulations, regulations, technical tasks and other corporate documents.

The project analyst is assigned a rather laborious and, at the same time, routine task , which currently has no automation equipment. As the analysis of modern business process modeling tools shows, even such well-known on the market applications such as Enterprise Architect, Business Studio, Bizagi Modeler do not have mechanisms to support building business process models based on their textual description.

The article solves the problem of extracting the BPMN-model from the document.



It should be noted that at present in the market of business process management ( BPM ) there is a technology of intellectual analysis of processes ( Process Mining ). However, unlike the technology described below, the input to the Process Mining system is a database with the results of the business process being modeled, and not a set of documents with its textual description.

Formulation of the problem


The formulation of an ideal task can be represented as a “ big red button ”, by pressing which the entire volume of the documents to be analyzed is automatically converted into a network of BPMN-models of the Customer’s business processes available for analysis, optimization and automation.

Solving the problem in this formulation is a matter of the future. We introduce a series of logical and technical constraints for a real pilot task.

Objective: To minimize the complexity of building a business process model for the text description while ensuring the completeness and connectedness of the model.

At the entrance there is a document in Microsoft Word format , which:


At the output we get an xml file in the format BPMN2.0 , which:


As a test example, we will use a text description of such a widespread process as Incident Management ( Incident Management ) from the ITIL standard library ( Information Technology Infrastructure Library ). The test case is consciously taken in English. English has no cases and is selected to facilitate the processing of references ( coreferences ) to the elements of a business process as part of a pilot task ( this will be discussed in more detail in the second part ).

At the output, an incident management model “ no worse than ” a flowchart provided in the ITIL library should be formed. By “ not worse ” criterion we will understand: the completeness and connectedness of business functions, data, decision-making conditions and participants in a business process.


Figure 1. A flowchart of the Incident Management process (ITIL v.3 Official Introduction, p.98)

Solution concept


According to the BPMN glossary ( Business Process Model and Notation, version 2.0 ), the business process ( Process ) is represented as "the graph of Flow-elements (a set of activities, events, gateways) and the Sequence Flow relationships that link them into an executable stream ."

Definition By BPMN-graph we mean a finite, directed graph ( Graph Theory ) with the following extensions:

  1. The vertices of the graph correspond to the BPMN-elements of the process ( Flow, Data, Participant ).
  2. The edges of the graph correspond to the BPMN process connections ( Sequence Flow, Message Flow, Association ).
  3. Vertices and edges have obligatory attributes: identifier ( id ), name ( name ), comment ( documentation ).
  4. Required vertex types are elements of the Flow category ( Activity, Event, Gateway ).
  5. Mandatory edge types are control flow connections ( Sequence Flow ).

Statement 1. Textual description of the business process in the document (in natural language) - contains the BPMN graph in an implicit form .

Statement 2. The task of extracting BPMN models from a document belongs to the class of tasks of extracting information from weakly structured computer-readable documents ( Information extraction ), the main subtasks of which are: named entity recognition , relationship extraction , reference resolution .

Combining the algorithms of graph theory and information extraction , we obtain the following solution steps .

  1. Document markup with BPMN tags ( to identify process elements ).
  2. Compiling BPMN tags into a BPMN process model ( to identify process associations ).
  3. Verification of the BPMN model ( to resolve links ).
  4. Correction of BPMN-model ( in case of non-compliance of the model with the text description ).
  5. Export the BPMN model to an xml file ( for converting a BPMN graph to a standard format ).


Figure 2. Process diagram of extracting a BPMN model from a document (BPMN Text Extraction)

Decision. Step 1: Markup of the document with BPMN tags


For marking BPMN-elements of the business process in the document we will use BPMN-tags.

Definition A BPMN tag is a colored text marker with an identifier containing the type of BPMN element. The name and color of a BPMN tag corresponds to a specific category of BPMN element.

Below are the colors, categories and types of BPMN tags, as well as recommendations for marking up the document ( finding the exact rules for identifying BPMN elements is the task of the next stage of the project ).


Table 1. Description of BPMN tags

The general principle for performing operations with BPMN tags: highlight a piece of text containing a BPMN element and press the button of the corresponding BPMN tag .
For example, to select a business process, select " INCIDENT MANAGEMENT ", then click the < Business Proces s> button. The background of the selected BPMN element is colored in the color of the selected BPMN tag, and a bookmark with the BPMN tag identifier is added to the document tabs.


Figure 3. The menu bar of the BPMN tab (a group of BPMN tags, Edit tags)

The following are the main operations on BPMN tags:


As a result of the markup of the test document, we obtain the following result.


Figure 4. BPMN markup of the text description of the Incident Management process (the image is clickable)

Note that the text has “ duplicate ” BPMN tags that have the same text and color (for example, Service Desk, Problem Management, Incident Record ) are links to the same process element. Processing of such references ( coreferences ) will be considered at the 2nd step of the solution.

To be continued…

Source: https://habr.com/ru/post/439934/