Prolog Diagnostic Medical Expert System

Introduction

As I was fortunate enough to choose the theme of the thesis in software engineering, and I chose to write an expert system, and it was in the Prolog language. Although it is almost never used in industrial programming, it is interesting theoretically to allow the fastest way to touch intelligent systems (IC). Also, the language itself is interesting in terms of sports, as it makes you think in an unusual manner, different from the thinking of procedural programming and OOP, which is a good training for the brain.

The Prolog implementation was used - Visual Prolog, with built-in GUI libraries. But if
If you want to write a GUI in Qt / C ++, then the documentation has instructions on how to import the program into a DLL, and compile it together with a C / C ++ project. It follows that it is possible to combine with other languages.

In general, when I was working on this project, I did not find examples that are not sufficiently primitive, but at the same time, and not as large as the sophisticated scientific examples. Therefore, this article can be a very good help for beginners, and people who want to write at least a little like IP.

Description of the purpose of the system: there is a set of parameters to which the cardiac signal (ECG) must correspond, by which the disease can be determined. The person answers the questions of the system (in the dialogue mode), and the system, acting as an expert on cardiac diseases, draws conclusions about a potential human disease.

Those. This code can be considered as a framework (prolog-framework), for creating expert systems from other areas, simply substituting its own rules and data. The rules for insertion will be described below.

First order predicate calculus and Prolog language

The Prolog language implements the logical programming paradigm, the idea of which is the use of computing technology for the logical inference from the declarative description of the subject domain. This distinguishes it from procedural programming languages, which describe well-defined sequential actions. Therefore, procedural languages are not suitable for writing ES, and if they are used, then only as auxiliary modules. Also, a distinctive feature of the Prolog language is the use of a subset of the first-order logic language for the description of a program, which is convenient and simpler than the others, transferred from the natural language of the description of the world by man. When creating an ES, this is one of the important problems - how to translate the knowledge of the subject area into a limited formal language, while preserving the information content invested by man. There is also a simple opportunity to translate knowledge back from the formal to the natural form, due to intuitive clarity and simplicity, compared with second-order logic or other mathematical calculus.

Prolog language constructs can be represented as type implications:

A1, A2, …, An <= B1, B2, …,Bm (n>=0, m>=0) (1)

where Ai and Bi are atomic formulas Fi (t1, t2, ..., tk) or negation Fi (t1, t2, ..., tk), where tk are terms (individual constant, variable or result of applying a function)

In Prolog, all relationships are recorded as facts and rules, where n = 1 in formula (1), to increase the efficiency of inference. The rules correspond to a formula called the Horn formula:

A<=B1, B2, …, Bm ,где m>0 (2)

Facts correspond to formula (2) with m = 0:

A<= (3)

The formula to be proved in the process of the inference mechanism is the formula (1) with n = 0, m> 0, is called the goal or query:

<= B1, B2, …Bm (4)

Of these language constructs, the entire Prolog program is composed.
For example, the following knowledge, in the form of a sentence in a natural language - “The Son Ani loves Masha” can be written as a fact:

 любит(сын(аня), маша)

and the sentence “Anya loves everyone whom Olya loves” in the form:

 (любит(аня,X)<=любит(оля,X))

Or reasoning “Every person is mortal. Confucius is a man ”:

 (человек(конфуций)& (смертен(X)<=человек(X)) <= смертен(конфуций)

Prolog also provides tools for defining recursive data structures such as trees and lists, which brings additional possibilities for conveniently describing relevant knowledge.

Thus, it is clear that Prolog provides a flexible, simple, and intuitive language for describing knowledge.

But then the question of choosing the implementation of this language arises, and it was decided to develop an ES on the extension of the Prolog language - Visual Prolog.

Visual prolog

Visual Prolog is a Prolog dialect with an object-oriented extension along with an integrated development environment. This environment provides support for creating graphical interfaces and many other libraries. The prologue is quite popular for creating an ES, with a simple syntax that is similar in meaning to subject matter relationships and facts. In general, the Prolog interpreter itself can be viewed as an ES in terms of first-order predicate logic, in which the user asks a question as a goal, the truth of which needs to be proved. This is a declarative language, it describes what needs to be obtained, instead of a sequential algorithm, perfectly suited to describe the knowledge of a small ES. Compared to alternative environments AMZI Prolog and SWI-Prolog, it provides a very effective interface for interacting with other languages, either by linking object files or by dynamically loading DLL libraries, like other languages, or as an independent DLL module. Also Visual Prolog is well documented, has many examples. There is also an opinion that the choice of implementation language, the way of presenting knowledge and the method of forming reasoning are secondary, compared to expert knowledge, and affect only the mechanism of their successful use. However, the availability of literature using the prologue as a means of creating an ES indicates the suitability of its use, at least in small systems.

ES design

In the literature, it is customary to divide ES into several main components, which are little dependent on each other: the knowledge base, the output mechanism, and the user interface.

There are 2 types of organization of expert systems: on rules and on facts.

Knowledge of ES on the rules is written in the form of production rules. The left part of each rule contains some alternative solution to the question, and the right parts (premises) are specified by other rules. The only exception is when the rule checks for the presence of information in the database, for example, the answer to a question. The algorithm of the inference mechanism operation is reduced to a comparison, then with a variety of options, the right one is chosen according to the principle of conflict resolution, and at the end the selected rule is applied and everything starts all over again. The advantages of such an organization is that it is very easy to add rules to the system without affecting other rules, and it is easy to supplement and modify it, because the programmer simply needs to insert the necessary rule into the appropriate rule block. Also, such an organization has limitations, namely, it is necessary to organize an unnatural algorithm for saving traces or use a variable in the rules for it, but this will break the convenience of modifying the rules, since for each predicate the rules will have to be organized by saving traces. It will also violate the readability of the program and the possibility of modifying the output mechanism in general, since along with the information about the rules, a crawl account will be included through the action of the output mechanism. Additionally, the Prolog does not allow saving the rules to the database file and reading the rules from the database, which would not allow updating the knowledge base while the program is running. Listing 1 gives an example of such an organization knowledge base.

Listing 1

 diag("SIRS"):- diag2("SIRS"). diag("Sepsis"):- diag("SIRS"), have("Sepsis character"). diag("Hard sepsis"):- diag("Sepsis"), have("Hard sepsis character"). diag("Shock sepsis"):- diag("Hard sepsis"), have("Shock sepsis character"). diag("MOF"):- diag("Hard sepsis"), have("MOF sepsis character"). diag("MOF"):- diag("Shock sepsis"), have("MOF sepsis character").

As you can see in Listing 1, the output mechanism should be completely controlled by the standard output mechanism built into Prolog.

ECs organized on logic (on facts) are more flexible, since, in contrast to the direct connection of rules, they are organized on the reference (indirect) method of describing relations of rules. Links are written in terms of terms, such as rule numbers, rather than nested rules of each other. Such a record is more consistent with a strict record in the form of predicate logic sentences, which makes the mechanism for deriving such rules more flexible. In contrast to organization on rules, the search for a Prolog solution is less dependent on the logical derivation process of the next rule, since in addition to output, it is possible to perform some other actions, for example, walk along an alternative decision tree or change the output at all and return to its original position all data about it.

Just as in the organization, the rules here operate a cycle: matching - conflict resolution - transition to the next rule, but since the output mechanism is controlled indirectly, it is convenient in the process to display information about the search for a solution and change this format without changing the knowledge base. The main advantage of organizing on logic is that the variables that are matched in the output process are automatically available in the output mechanism, and can be output together by a chain of recent rules that are derived to the current goal. But for this, it is necessary to additionally provide a type ghost mechanism for different types of rules. Also, the knowledge base is stored as facts, so it can be saved or read from a file on the media. It is harder to modify and debug a system on facts than on rules, since the output mechanism is controlled by the values of variables and terms, for example, in the form of a rule number, which introduces additional possibilities for errors.

Also in the inference engine, there are more variables affecting this process, sometimes of a recursive type and rather interconnected, which complicates the development and error capture, but this is the price for the flexibility it provides. The system also works on facts more efficiently and faster than on rules, which is especially important for bulky real-time ES, or even for distributed real-time ES.

A large ES for medical diagnosis in the case of admission of patients in critical condition, especially if used by many doctors at the same time, must also fit into strict time frames.

It is also harder to test a system on facts because of its dependence on facts, since in the system on rules just an error would affect the output mechanism, and this would be immediately apparent, but most likely will continue to work on facts.

As a result, since the system being developed should be informative for the user, namely, a reasonable answer to the question of why the system needed the required information should be displayed in the process of logical inference so that the user knows at what stage of the search for solutions he is, which cannot be ensured without saving the trace. search. Also at the end, after answering the question, the user should be able to see the proof tree for the solutions found. Taking into account all the parameters of the system, the system on the rules seemed to me the most rational, since it can be saved to disk and its output mechanism will be easier to organize. Also, knowledge on the diagnosis of cardiovascular diseases turned out to be rather difficult to formalize due to their semantic interconnectedness, especially in those places where the meaning of the conclusions of the rules meant taking into account additional information for each one. Therefore, when choosing a mechanism for exchanging data of different types between predicates, a calculation was made for the logical method of organization.

There are also 2 types - logical inference: reverse and forward logical inference. In complex systems of medical consultations, a combination of these types is also used: some kind of knowledge is derived from one type, and on its basis another one is already being used.

The direct conclusion is to find the effect on the basis of a multitude of facts and the subsequent conclusion from the new consequences of other conclusions. It is effective when there are many hypotheses related at different levels that need to be proved or when there are many axioms for the derivation of many more hypotheses.

The converse is arranged the other way around, first a hypothesis is chosen that needs to be proved, and then an attempt is made to prove the premise of this hypothesis until the next premise is found as an axiom.

In ES, the reasoning on the rules is implemented by the method of reverse inference, which uses Prolog. This option was chosen, since, unlike direct inference, it works more efficiently with this knowledge, since the number of target vertices is much more than facts and there are no competing hypotheses (they are independent).

Implementation

In Prolog, rules are implemented as a rule predicate. Each rule has its own number and name, which is a hypothesis (conclusion) that needs to be proved. The condition of the rule corresponds to an implicit statement that completely determines the truth of the hypothesis.

Rule condition:

 Colclusion(K)=C1(K) and (C2(K) or C3(K)) and C4(K)

C1 (K) = True all conclusions of the rules with the numbers recorded in the first list for the K-th rule

C2 (K) = Truly at least one rule from the second list for the K-th rule
C3 (K) = Truly exactly N (K) conclusions of the rules from the second list for the K-th rule

C4 (K) = Predicate for the Kth rule (predicate doc (K)), which the user describes.
If any list is empty, then the corresponding statement is true.

In this case, any other predicates can be entered into the predicate for C4, but they should not violate the logical meaning of the conclusion. To support denial, some existing conclusion in the knowledge base can be entered here and his.

The inference engine ensures that the rules are applied in the desired sequence and stores the trace.

The main principle of inference is to prove the truth of the conclusion, to consistently prove all its conclusions in the first list, then to prove the conclusions of the second list and at the end to check the truth of the predicate corresponding to the number of the current rule. To check the second list, if at least one true rule is found, all further rules will be checked to the end of the list.

Predicate Description

In general, the principle of knowledge coding is implemented in Prolog using two predicates in the following form:

rule (K, “Conclusion text - name of the rule”, Id, List1, List2, N)
doc (K)
K - rule number
Id - The data identifier for this rule.
List1 - the first list of rule numbers
List2 - second list of rule numbers
N is the number of true conclusions

Moreover, the rule is always written as a fact, without variables, and its truth, as was said, determines, in addition to lists, the corresponding predicate doc.

Here are some examples of knowledge records:

rule (93, "TASH - TASH-score 43%", tashSc10, [47], [], 0)
rule (33, "TASH - Excess of base - is the sum of points", none, [], [29,30,31,32], 1).

In the predicate doc, any other predicates can be used, which makes it possible to match the text of a rule statement to a set of any situations that can be described by means of a Prolog.

Thus, all knowledge is written using these two predicates.

In doc predicates, the user interface functions are mainly called, since some rules are implicitly connected with the user's response or they are checked for valid values.

For example, doc predicates for rules whose truth depends on the interval in which the total score is located, make the appropriate checks.

Predicate doc for diagnosing diseases of sepsis uses an additional predicate docc, allowing you to not ask the user too many questions. For example, if a sign requires the presence of at least two signs, with a negative answer to two signs already, the system should not ask more questions, since it is obvious that the sign cannot be true. For this purpose, the docc predicate provides for checking for the presence in the database of more than two negative answers. It also does not make sense to ask the third question if there is already an answer to two questions sufficient to establish the truth of the resulting feature.

The predicate kolNeg (Number) searches for the number of negative answers in the database for this group of attributes. To do this, he first looks for various signs of this group of signs, so as not to confuse them with other groups, and then counts them from the set in the database using the predicate kol_neg_list_in_db, which works with the list of signs of this group.

Code examples

The project is large, so I will give the most important passages.

Listing 2 - List of rules

 rule(11,"Sepsises - SIRS призн. 1: t>38C или t<36С",sirsPr1,[],[],0). rule(12,"Sepsises - частота сердечных сокращений>90",sirsPr2,[],[],0). rule(13,"Sepsises - частота дыхания>20 или PaCO2<32mmHg",sirsPr3,[],[],0).

Listing 3 - Fact List

 fact1(sirsPr1,"SIRS","SIRS призн. 1: t>38C или t<36С"). fact1(sirsPr2,"SIRS","частота сердечных сокращений>90"). fact1(sirsPr3,"SIRS","частота дыхания>20 или PaCO2<32mmHg"). fact1(sirsPr4,"SIRS","кол-во лейкоцитов>12.000/mm>3, <4.000/mm>3 или >10% band").

- List of logical consequences

 tash_score(1,tashSc1,0,8,"Вероятность прогноза <5%"). tash_score(2,tashSc2,9,9,"Вероятность прогноза 6%"). tash_score(3,tashSc3,10,10,"Вероятность прогноза 8%").

Listing 4 - Output Machine

 sizeList([],0):-!. sizeList([_|T],Size):- sizeList(T,SizeTail), Size=SizeTail+1. append_der_list([],List,List). append_der_list([H|L1],List2,[H|L3]):- append_der_list(L1,List2,L3). any2(NeedSize,NeedSize,_,[],[],_):-!. any2(_,_,[],[],[],_):-!. any2(NeedSize,Size,[H|T1],[H|T2],[FirstDer|OtherDer],Why):- Nomer=[H], go(Nomer,UnderFirstDer,Why), rule(H,Text,_,_,_,_), FirstDer=tree(Text,unmarked,UnderFirstDer,0),%der(H,UnderFirstDer), Size1=Size+1,!, any2(NeedSize,Size1,T1,T2,OtherDer,Why). any2(NeedSize,Size,[_|T1],List,OrDer,Why):- !,any2(NeedSize,Size,T1,List,OrDer,Why). go([],[],_):-!. go([H|T],[FirstDer|OtherDer],Why):- rule(H,Name,_,ListAnd,ListOr,KolOr), NewWhy=[Name|Why], go(ListAnd,UnderFirstDer,NewWhy), goOr(ListOr,KolOr,_,OrDer,NewWhy), append_der_list(UnderFirstDer,OrDer,TwoDers), FirstDer=tree(Name,unmarked,TwoDers,0), asserta(why_trace(NewWhy)), doc(H,NewWhy), go(T,OtherDer,Why). goOr([],_,[],[],_):-!. goOr(ListOr,KolOr,ListYes,OrDer,Why):- KolOr<>0, any2(KolOr,0,ListOr,ListYes,OrDer,Why), sizeList(ListYes,KolOr). goOr(ListOr,0,ListYes,OrDer,Why):- any2(100000,0,ListOr,ListYes,OrDer,Why), sizeList(ListYes,KolListYes), KolListYes>0.

Listing 5 - Concluding Outcomes

 tashQuestion(Id):- fact2(Id,_,Prisnak,_), pos(Prisnak),!. tashQuestion(Id):- fact2(Id,_,Prisnak,_), neg(Prisnak),fail,!. tashQuestion(Id):- fact2(Id,_,Prisnak,Ball), not(neg(Prisnak)), not(pos(Prisnak)), dialog_ynw(Prisnak,Ans), tash_in_data_base(Ans,Prisnak,Ball),!. tash_in_data_base("y",Prisnak,Ball):- asserta(pos(Prisnak)),sum_tash(Sum1),Sum2=Sum1+Ball,asserta(sum_tash(Sum2)),!. tash_in_data_base("n",Prisnak,_):- asserta(neg(Prisnak)),!,fail. tash_in_data_base(_,_,_):- write("\nTASH-not correct answer"),!,fail. oneQuestion(Id):- fact1(Id,_,Prisnak), pos(Prisnak),!. oneQuestion(Id):- fact1(Id,_,Prisnak), not(neg(Prisnak)), question_sepsis(Prisnak),!.

findings

I hope this article will help beginners to build their expert system.