Rule based diagnosis system for diabetes
In this time of chaotic era, where world is widely affected by diseases such as diabetes, there exists a need for an expert system which can predict diabetes at the very early stages with minimum of fuss and in a time efficient manner. The system should be efficient enough to forecast whether a person is suffering from diabetes or not, with the ability to predict the probability among various types of diabetic types like type-1, type-2, pre-diabetes, and gestational, with which the patient is suffering. The developed system is influenced by various evident factors collected from plethora of sources such as Physicians, books, Internet, medical journals etc. The variety of factors acts as an indicator and serves the basis of rule formation. Rules formations are further engineered with the help of modern techniques like Fuzzy Logic to create an Inference engine. User input is passed to the inference engine, which subsequently produces the output in terms of diabetes type with its probability of occurrence. The proposed expert system is based on a particular demographic and has been further validated on selected patient’s which had produced hundred percent accurate results.
Rule-based system, Knowledge based system, Fuzzy set, Diabetes, Decision support system, Inference engine, Model view controller.
In this era, even though healthcare has developed by leaps and bound yet diabetes is proving to be one of the most chronic and rampant diseases. Diabetes is the problem used to describe a metabolic condition of having higher than normal blood sugar levels. This is also called as hyperglycemia. Diabetes occurs when insulin is not being properly produced or responded by the body, which is essentially needed to maintain the proper level of sugar in the human body. However, Diabetes can be controlled with the help of insulin injections, taking oral medications (pills), a controlled diet (changing eating habits), and exercise programs, but no comprehensive cure is available as yet.
The basic major and minor symptoms for all types of diabetes are mentioned in Table 1 in which the symptoms which apply to a particular type of diabetes are stored with 1 and which is not applicable are stored with 0. The symptoms of diabetes which are mentioned in the table some not applicable in case of females like Impotency (Sexual erectile dysfunction) and vice versa the symptoms which are not applicable in case of males are only considered in case of females like previous pregnancy, baby over 9 pounds during previous pregnancy, Nausea, vaginal mycotic infection, loss of menstruation, polycystic ovary syndrome, low blood sugar in the baby immediately after delivery.
|S. no.||Symptoms||Type 1||Type 2||Pre diabetes||Gestational|
|2||Increased urge to urinate||1||1||1||0|
|13||Depression and stress||0||0||1||0|
|15||Fruity breath odour||1||0||0||0|
|18||Family history of diabetes during pregnancy||0||0||0||1|
|20||Baby over 9 pounds during previous pregnancy||0||0||0||1|
|31||Aches and pains||0||1||0||0|
|32||Recurrent fungal infection||0||1||0||0|
|38||Vaginal mycotic infection||1||1||0||1|
|39||Rapid heart beat||0||1||0||0|
|40||Recurring gum infections||0||0||1||0|
|42||High blood pressure||0||0||0||1|
|44||Making unusual noises||1||1||0||0|
|49||Loss of menstruation||1||1||0||0|
|52||Areas of darkened skin||0||1||0||0|
|55||Lack of coordination||1||1||0||0|
|56||History of heart disease||0||0||1||0|
|57||Polycystic ovary syndrome||0||0||1||0|
|58||Low blood sugar in the baby immediately after delivery||0||0||0||1|
|59||Waist size more than 102 cm in male and 88 cm in female||0||0||1||0|
|60||Waist to hip ratio more than 0.9 in male and 0.85 in female||0||0||1||0|
Table 1: Knowledge base of the diabetes disease diagnosis system.
Diabetes can be categorized into four types:
Type 1 (Juvenile or Insulin Dependent or Brittle or Sugar) diabetes
Type 2 (Adult onset or Non-insulin dependent) diabetes
Type 1 diabetes mostly affects children and young adults but, however can affect at any age. 5-10% of diabetes patients suffer from type 1 diabetes. The body is unable to produce insulin. Insulin helps the body to use sugar from food as a source of energy. People with type-1 diabetes need insulin therapy. There is a destruction of insulin secreting cells (β cells) of pancreas in our body. The cause is immune-mediated or idiopathic. It usually requires external insulin therapy hence known as Insulin Dependent Diabetes Mellitus (IDDM). The basic major and minor symptoms for type 1 diabetes are also mentioned in Table 1, in type 1 column in which the symptoms which apply to a this type of diabetes is stored with 1 and which is not applicable is stored with 0.
Type 2 diabetes is the most common type of diabetes with 90-95% of all the diabetes patients suffering from this type. Type 2 diabetes symptoms often develop slowly. This type mostly occurs in the people more than forty years old but can also be found in younger age group. Type 2 diabetes symptoms are usually the same for men and women however certain symptoms are gender based like urological problems such as Erectile Dysfunction (ED) etc. which seen only in males. This is asymptomatic for many years. Body loses the ability to produce adequate insulin (Insulin deficiency) or the body cells do not respond to insulin (Insulin resistance) or both. It can be controlled but needs lifestyle modification, taking oral medications (pills) and if required insulin therapy but no complete cure for this type diabetes also is available.
The basic major and minor symptoms for type 2 diabetes are mentioned in Table 1, in type 2 column, in which the symptoms which apply to a this type of diabetes is stored with 1 and which is not applicable is stored with 0.
Pre diabetes is a milder form of diabetes that is sometimes called impaired glucose tolerance. It can be diagnosed with a simple blood test.
The basic major and minor symptoms for Pre diabetes are mentioned in Table 1, in Pre diabetes column in which the symptoms which apply to this type of diabetes is stored with 1 and which is not applicable is stored with 0.
Gestational diabetes occurs during pregnancy. It raises mother’s risk of getting diabetes for the rest of her life. It also raises the child’s risk of being overweight and getting diabetes. It displays a high blood sugar level during pregnancy, usually occurs at around 28 weeks or later, and affects about 4% of all pregnant women. This type of diabetes usually goes away after pregnancy and causes are yet unknown.
The basic major and minor symptoms for Gestational diabetes are also mentioned in Table 1, in Gestational column, in which the symptoms which apply to this type of diabetes is stored with 1and which is not applicable is stored with 0.
In this research work, the authors have tried to design an expert system and model an Inference Engine based on Fuzzy logic (weighted fuzzy logic) to predict the occurrence as per diabetic type. The authors target demography was daily patients (age group: 25+ both male and female) who visited a certain Physician with their Diabetic related problems.
The rest of the paper is organized as follows: Brief description of Clinical Decision Support System (CDSS) and Model View Controller (MVC) are in section 2, problem definition or motivation is introduced in section 3, related work is presented in section 4, proposed methodology is discussed in section 5, results and discussion are present in section 6, discussion and future directions are devoted to section 7.
Clinical Decision Support System (CDSS)
A CDSS or simple Decision Support System (DSS) or Health information technology system is a computer program or a software application that analyses data or designed to assist Health professionals or physicians to make clinical decisions or determining diagnosis of patient data. It may also be defined as an active knowledge system, which uses two or more items of patient data to generate case-specific advice. It improves clinical practice. Its purpose is to determine diagnosis, analysis, etc. of patient data.
The above Figure 1 shows the block diagram of DSS of diabetes diagnosis. In this research work, the work emphases on fuzzy logic which may be seen in Figure 1.
Knowledge based CDSS: A Knowledge Based System (KBS) is a computer program that uses Artificial Intelligence (AI) technique to solve problems within a specialized domain that ordinarily requires human expertise. Knowledge Based (KB) CDSS uses a KB, means applying rules to patient data using an Inference Engine (IE) and display the result to the end user. KB CDSS consists of three parts KB, IE, and a mechanism to communicate (User Interface). The communication mechanism allows the system to display the results to the user as well as provide inputs into the system. The typical tasks for expert systems involve classification, diagnosis, prognosis, monitoring, design scheduling, and planning for specialized tasks. KBS is a more general than the expert system.
A KBS draws upon the knowledge of human experts captured in a knowledge-base to solve problems that normally require human expertise.
An ES is sometimes called a KBS. An ES is judged to be successful when it operates on the level of a human expert.
ES is needed due to the following reasons:
Scarcity of Human expertise
Tiredness from physical or mental workload
Forget crucial details of a problem/Limited working memory
Inconsistent in their day-to-day decisions/Slow in recalling information stored in memory
Unable to comprehend large amounts of data quickly/Unable to retain large amounts of data in memory
Can deliberately avoid decision responsibilities
Lie, hide and die.
The advantages of ES are:
Increase available of expert knowledge
Efficient and cost effective
Consistency of answers
Explanation of solution
Deals with uncertainty
However the limitations of ES are:
Lack of common sense
Systems are not always up to date
May have high development costs
Restricted domain of expertise limited to knowledge base
Not always reliable
The following are the example of expert system in the field of medicine: Zvnx Health, MYCIN (Microbial disease diagnosis and treatment), CADUCEUS (Internal medicine disease diagnosis), Internist-I, PUFF (Pulmonary disease diagnosis), VM (Monitoring of patients need to intensive care), ABEL (Diagnosis of acidic materials and electrolytes), AI/COAG (Blood disease diagnosis), AI/RHEUM (Rheumatic disease diagnosis), ANNA (Monitoring and treatment analysis), BLUEBOX (Depression diagnosis and treatment), ONCOCIN (Treatment and management of patients chemotherapy, ATTENDING (Anesthesia management education), GUIDON.
In Figure 2, it may be seen that how expert system works. If any non-expert user or layman raises any query then IE helps to find out the answer or give advice to the layman.
The components of a KBS are:
Knowledge base (facts)
Knowledge base (KB): A KB is a technology employed to store complex structured and unstructured information used by a computer system. KB contains facts and rules about the task domain. The domain is the area of expertise that the ES is intended to work within.
A KB is an organized collection of facts about the system’s domain. An IE interprets and evaluates the facts in the KB in order to provide the answer.
The Table 1 has been designed with the help of Physicians, books, Internet, medical journals, diabetes patients etc. It contains all the symptoms of diabetes and assigns them as per types. The Table 1 is considered as a matrix in the database.
Inference engine (IE): An IE tries to derive an answer from a knowledge base. It evaluates the set of rules on the given set of inputs. It is the brain of the expert systems that provides a methodology for reasoning about the information available in the knowledge base and for formulating conclusions.
In the context of ES, IE is the program that operates on the KB and produces inferences. If the KB is regarded as a program then the IE is the interpreter. IE using the rules contained in the rule base is used for decision-making output. In simple rulebased system, there are two kinds of inference,
Forward chaining starts with the known facts and asserts new facts. The opposite of forward chaining is backward chaining. Backward chaining starts with goals, and works backwards to determine what facts must be asserted so that the goals can be achieved. FC is a popular implementation strategy for expert systems, business, and production rule systems. FC starts with the available data and uses inference rules to extract more data until a goal is reached. Event driven systems are a common application of forward chaining rule engines. One e.g. of a forward chaining application might be telecoms plan provisioning engine.
The IE is the main processing module. In this engine, input is the set of answers to the questionnaire from the physician’s input. The IE interacts with the knowledge base, which is constructed using Fuzzy Logic rule base.
User interface: This acts as a mediator between the user and application. The purpose of User Interface is Input and Output. J2SE has been considered here. In J2SE, User Interface can be classified as: Components, which are the child windows that are placed over containers (Label, Button, TextField, TextArea, List, Choice, Scrollbar, Checkbox) and container, which are the parent windows that contain the components (Applet, Frame, Dialog).
Non knowledge based CDSS: CDSS that does not use KB. It uses a form of AI, NN, GA, EC and SVM called machine learning which learns from past experiences.
Rule based: The fuzzy rules for this research were developed with the help of diabetologists. A rule base that contains numerous fuzzy IF-THEN rules.
Evidence based: Evidence based is applying the best available research results (evidence) or best research-proven assessments when making decisions about health care along with clinical expertise and patient preferences. It uses various methods like carefully summarizing efforts, putting out accessible research summaries. Its goal is to eliminate unsound or excessively risky particles in favor of those that have better outcomes.
Fuzzy logic: Fuzzy logic is accepting imprecise and vague data and providing a decision. Fuzzy logic representations are founded on fuzzy set theory try to capture the way humans represent and reason with real world knowledge in the face of uncertainty. Lofti A. zadeh introduced fuzzy set theory in 1965 is a generalization of classical set theory. Fuzzy set theory  is an excellent mathematical tool to handle the uncertainty arising due to vagueness.
The advantages of fuzzy logic are mentioned in .
Model view controller (MVC): Trygve Reenskang had presented MVC in 1979. It has been used for the first time in Smalltalk which is the framework in making Apple interfaces. MVC stands for model view controller. It is an architectural for World Wide Web applications in major programming languages, which is used for implementing user interfaces in computers. It is popular for designing web applications or in today’s era most used pattern for World Wide Web applications.
MVC components are usually implemented as separate classes.
The MVC model defines web applications with 3 logic layers or tiers:
Tier 1: View logic or display layer or Client
Tier 2: Input layer or Controller logic or Server
Tier 3: Model logic or business layer or Database
It provides clear separation between presentation logic and business logic.
In the Figure 3, it may be seen that model comes in the category of data whereas both controller and view is the combined form of presentation.
Data: Model deals with data or storage. It is the internal representation and is used to manage the data, logic, and rules of the application in terms of its problem domain. A modelstored data is based on the commands of the controller and then displayed in view. It can have logic to update controller as per data changes.
Presentation: View deals with display or presentation or rendering. It is the user interface (e.g., Button) or is the window on the screen, with which the user interacts with the view. It is used for display or visualization of the data contained in the model. It is used for generating and presenting the output to the user, based on the model. If something changes in the model then view generates new output. Controller combines model with view. Controller is the code (e.g., call back for button). It acts on both model and view and is the glue between the model and the view. It decides what the model is supposed to do or updates the model and handles the view. It is used to accept inputs and converts it to commands for the model or view. It controls the data flow into model and updates the view whenever data changes. It keeps the view and model separately.
The proposed system has been architectured based on concepts of MVC (Model View Controller architecture), where user’s inputs/selects various symptoms. Rule based environment acts as View Component of MVC, Inference Engine and Knowledge based algorithms depicts model component and the logic which connects both view and model acts as controller component to manage the overall information flow and prescribe an end result in the form of a suitable view.
For diagnosis and treatment, knowledge-based systems and intelligent computing systems are used. Mostly these system approaches involve models of Fuzzy Logic, Artificial Intelligence (AI), Genetic Algorithm (GA), Neural Network (NN), Support Vector Machine (SVM), Evolutionary Computing (EC) or Hybridization of all these techniques with an appropriate reasoning mechanism. The proposed system is implemented as a rule based expert system by using Fuzzy logic as a methodology for the diagnosing of diabetes. Fuzzy set has shown a better knowledge representation to improve the decision making process.
As of now, we are observing that in this age, computers have transformed the world and have become the part and parcel of almost every aspect of human existence. With the help of computers, it is easy for physicians to analyse, diagnose and treat diseases and other medical problems. These experts systems are now widely used in clinics and hospitals all over the world. The proposed expert system is very useful for patients as well as for physicians to make a correct diagnosis of the disease. However this has been developed not for the direct use of patients but it is more useful for physicians to make correct diagnosis of the disease by helping in determining what type of diabetes the patient is suffering from and the maximum probability of its occurrence.
Problem Definition or Motivation
It is well known that till some year’s back physicians diagnosed the disease with a combination of pure experience and clinical data of the patient mainly the laboratory tests reports. The laboratory test reports may vary depending on meals, exercise, sickness, stress, change in temperature, type of equipment used, and way of sample handling. So, this kind of diagnosis of the disease is time consuming because it is entirely dependent on the availability and the experience of the physicians who deal with imprecise and uncertain clinical data of the patients. Researchers said that even the experienced physicians are not able to detect the disease quickly and accurately. They require several feedbacks from the patient to diagnose a particular disease. It is always a problem for the physicians to diagnose more accurately and in a speedy manner. The main problem behind any misdiagnosis of disease is stated below:
Lack of communication: Patient’s does not communicate well with the physicians which mean not revealing the right symptoms from which they suffer.
Lack of experience: If the patient does not go either to a well experienced physician or to a partial matching specialist.
Lack of pathological tests or laboratory tests: Lack of wellestablished equipment, chemicals, experienced pathologist etc. In developing countries also if well-equipped laboratories are found they are available only in few big cites or metropolitans.
Comak et al.  stated fuzzy weighting pre-processing and LSSVM in which fuzzy weighting pre-processing is used for pre-processed in liver disorder dataset and then LS-SVM is used for classification on that pre–processed dataset by which obtained 94.29% classification accuracy which is the highest classification rate in literature, without pre-processed only obtained 60.0% classification accuracy with LSSVM alone.
S et al.  used rough set system for the design and developed a diabetic diagnostic system in which firstly approximation sets are generated and the diagnosis is done by taking only those objects into account.
Chu et al.  built a data warehouse and a forward-chaining rule-base expert system to assist researchers in exploring the data to select appropriate statistics methods to find possible significant differences. The prototype of the expert system has been implemented through data warehouse and On-Line Analytical Processing (OLAP). Microsoft SQL server 2005 is used to construct a data warehouse and OLAP because it provides a friendly user interface for building data warehouse and OLAP.
Borgohain and Sanyal  proposed a questionnaires rule based expert system for the diagnosis of neuromuscular disorders i.e. cerebral palsy, Parkinson’s disease, muscular dystrophy, multiple sclerosis which has been implemented by using Java Expert System Shell (JESS). This rule based expert system makes use of backward chaining and RETE algorithm in which backward chaining is used for inference engine and RETE algorithm is used for searching the knowledge base.
Naser and Ola  proposed an expert system for diagnosing eye disease using CLIPS (C Language Integrated Production System). CLIPS language is used as a tool for implemented this expert system.
Soundararajan et al.  used fuzzy logic for the decision support system in tuberculosis medicine. This method is a well suitable for developing knowledge based systems in tuberculosis. It will be helpful for physicians to detect class of tuberculosis by which their consuming timing will reduce the diagnostic process.
Choubey and Paul  introduced GAMLP NN for the classification of PIDD. This work consists in two stages firstly GA has been used as a feature selection then Multi -Layer Perceptron Neural Network (MLP NN) has been used for classification on the selected features by GA and on all the features. The authors have compared the result with MLP NN and GA_MLP NN to assure that the benefit of feature selection.
Pabbi  used fuzzy rule for the diagnosis of Dengue fever. The rules were made by the help of physician using Fuzzy logic.
Sharma  used fuzzy rules for the treatment of Epistaxis patient. This rule based expert system will help to physician to improve the probability of finding disease more accurately and speedily.
Choubey and Paul  analysed and compared the several existed work on diabetes with their advantages, issues, technique, tool used, existed work, future work. The used technique, tool has also been discussed on the basis of following parameter i.e., advantages, issues, application.
Anooj  used weighted fuzzy rules CDSS for the diagnosis of heart disease. He used some mining techniques for attribute selection and the selected attributes used to generate fuzzy rules and then apply to the Fuzzy rules weighted based on the frequency in the learning datasets. This weightage Fuzzy rule used to build the CDSS of Mamdani fuzzy inference system (FIS).
Tomar  presents a rule base diagnostic decision support system for medical disease diagnosis. The rule-based expert system will assist to the physician in determining the best course of treatment.
Zeki et al.  designed a rule based expert system for diagnosis of all kinds of diabetes. This rule based expert system has been also tested and validated.
Choubey and Paul  used GAJ48graft DT for the classification of PIDD. The method J48graft Decision Tree (J48graft DT) for the classification of data, and GA is used as a feature selection and then have performed once again classification on the selected feature.
Kahramanli and Allahverdi  used Artificial Neural Network (ANN) and Fuzzy Neural Network (FNN) for the classification of heart and diabetes disease. The method applied on PIDD and Cleveland heart disease dataset and achieved the accuracy 84.24%, 86.8% respectively better then several existing technique.
Barakat et al.  worked on the classification of diabetes using a machine learning approach such as SVM. The authors stated Sequential Covering Approach (SQRex-SVM) for rule extraction and Electric method also for rule extraction to enable SVMs to be more intelligible.
The figure of process chart methodology is shown below.
Initially the information (symptoms/questionnaire) is collected from physicians, books, internet, medical journals and diabetes patients about symptoms affecting them which may be related to diabetes. It may be seen in Figure 4 that after selecting the questionnaire, it has been tested on 115 diabetes patients in Bombay Medical Hall, Ranchi, India. Among 115 patient’s, 74 were male and 41 were female. Among 115 patients, 94 answered yes to “increased thirst”, 56 answered yes to “weight reduction”, 86 answered yes to “tiredness”, etc. Then weight has been assigned 0.817, 0.487 and 0.748 to the respective symptoms. For better clarity symptoms like “Vaginal mycotic infection”, “Loss of menstruation”, “Polycystic ovary syndrome” have been considered which occurs only in female and 21, 12 and 17 said yes to suffering to these symptoms respectively. A weight of 0.51, 0.48 and 0.17 has been assigned to the respective symptoms. In such a way, the weight has been calculated for every symptoms. Likewise probability is assigned to every type of diabetes. Suppose “Increase thirst” symptoms are found in the case of type 1, type 2, and pre diabetes then probability of 0.33 is assigned to every mentioned type. Now let us consider the symptoms “Weight reduction”, it occurs only in the case of type 1 then probability assigned is 0.100. For even more clarity consider “Tiredness”, it occur in type 1 and type 2 and weight assigned will be 0.5. So, after getting the questionnaire filled by the diabetes patient the weight of every symptom will be multiplied to the respective probability for every type of diabetes by using fuzzy logic methodology rule. Based on this the result will be appear that which kind of diabetes has the highest chance to afflict the particular patient. After preparing this knowledge based system, this has been validated on 5 patients and all were corrected by verified laboratory test reports to rectify any error which may have occurred due to the same demographic.
In this approach symptoms are considered as inputs and then Fuzzification is applied on these inputs. The fuzzification approach also used in . Fuzzification converts the crisp quantities into fuzzy quantities. After the fuzzification, the rules are designed using If-Then rules. These If-Then rules statements are used to formulate the conditional statements that consists Fuzzy Logic. A single fuzzy if-then rule considers the form: If x is ‘A’ then y is ‘B’. After designing the interface has been done by using the forward chaining method. This method is used for checking the condition part of a rule to determine whether it is true or false. If condition is true, then the action part of the rule is also true . This action continues until a result is obtained or reaches for a dead node. Forwarding chaining is commonly considered as data-driven reasoning. After interfacing rules, the defuzzification of the output is done where defuzzification process converses of a fuzzy quantity into a precise quantity and expert system displays the result of analysis.
The fuzzy rules for this research were developed with the help of domain experts i.e. Diabetologist. Some of the rules for including all four type of diabetes diagnosis that can be interpreted are as follows:
If ((Increase thirst=true) and (Increase urge to urinate=true) and (Increase appetite=true)) Then type1 or type 2 or pre diabetes.
If ((Weight reduction=true) and Fruity breath odour=true) and (Bed wetting=true) and (Nausea=true) and (Vomiting=true) and (Pale skin=true)) Then type 1 diabetes.
If ((Itchy skin=true) and (Family history=true) and (Depression and stress=true) and (Tingling and sensation=true) and (Recurring gum infections=true) and (History of heart disease=true) and (Polycystic ovary syndrome=true) and (Waist size more than 102 cm in male and 88 cm in female=true) and (Waist to hip ratio more than 0.9 in male and 0.85 in female=true) Then pre diabetes.
If ((Impaired vision=true) and (Impatience=true) and (Infection=true)) Then type 2 or pre diabetes.
If ((Over weight=true)) Then type 2 or pre diabetes or gestational diabetes.
If ((Weight variation=true) and (Slow-healing wounds=true) and (Recurrent fungal infection=true) and (Rapid heartbeat=true) and (Areas of darkened skin=true)) THEN type 2 diabetes.
If ((Vaginal mycotic infection=true)) Then type 1 or type 2 or gestational diabetes.
If ((Tiredness=true) and (Sleeplessness=true) and (Trembling=true) and (Sweating=true) and (Anxiety=true) and (Confusion=true) and (Weakness=true) and (Mood swings=true) and (Dry skin=true) and (Aches and pains=true) and (Nightmares=true) and (Seizures=true) and (Sadness=true) and (Unconsciousness=true) and (Numbness=true) and (Impotency=true) and (Sleep walking=true) and (Making unusual noises=true) and (Leg cramps=true) and (Slurred speech=true) and (Flushed face=true) and (Loss of menstruation=true) and (Stomach pain=true) and (Deep breathing=true) and (Difficult concentrating=true) and (Dehydration=true) and (Lack of coordination=true)) Then type 1 or type 2 diabetes.
If ((Family history of diabetes during pregnancy=true) and (Previous pregnancy=true) and (Baby over 9 pounds during previous pregnancy=true) and (Sleep walking=true) and (Low blood sugar in the baby immediately after delivery=true) Then gestational diabetes.
In the below Figure 5, it may be seen that after applying weighted fuzzy rules on all symptoms determining the type of diabetes with maximum probability of its occurrence is possible and this will help save cost, achieve higher accuracy and lesser time required for diagnosis of diabetes.
The whole process is divided into following stages:
1) Data collection from knowledge base for all the symptoms
2) Data collection from end users and filtarization according to rules
3) Final outcome and decision formulation
Stage 1: Data collection from knowledge base for all the symptoms: The Figure 6 depicts the basic flow of data collected from the knowledge base for further processing. In this step, we mainly gather all the information from Knowledge base a repository created in the system, which keeps information such as
1). Various symptoms, which are the primary cause for diabetes.
2). Matrix of all the symptoms with various types such as type-1, type-2, prediabetes, gestational following particular pattern/rules.
3). Fraction/weight assigned to each symptoms belonging to a common (both male and female) or Female specific symptoms.
Stage 2: Data collection from end users and filterization according to rules: In this stage End User/Patient’s inputs are collected by the User Interface containing list of all the symptoms. User’s Input is based on Yes/No Format, either an end user affirms the symptoms/conditions he/she is facing or simply denies it.
User input is further processed according to the groups in which particular symptoms are a member. This process is known as filterization according to rules. A map is created to hold the +ve values from end user and its corresponding matrix for various types. This map also indicates the probability for each type within a symptom matrix.
Stage 3: Final outcome and decision formulation: In this stage filtered User inputs are further processed according to the weight assigned to each symptom. A mf Mulitpiler with Type method is called to perform the multiplication operation between each symptom in the current matrix and weight belonging to Common/Female category. Generated outcome (Matrix_elemet × Weight × Probability_factor) is collected for each Symptom based on their Types and Average is calculated to predict final result.
i). Final outcome is formulated consisting of following parameters:
ii). MaxElement: Double (represents the maximum probability for type among Type 1, Type -2, Prediabetes, and Gestational)
iii). MaxElement_Type: String (represents the diabetes type for which maximum probability is found)
Outcome_T1, Outcome_T1, Outcome_PD, Outcome_GS: double represents the corresponding probabilities values of each type.
The whole methodology of the proposed work is described in the form of algorithm as mentioned below:
Stage 1: Data collection from knowledge base for all the symptoms
Initialization: initialize following variables, maps and lists
SympotmMap: HashMap < Integer, List < String >>
overallRuleTypeList: ArrayList < WrapProbWithType >
ruleMap1, 2, …, n: HashMap < Integer, List < String >>(To store all the rules based data from Database, n is number of rules e.g. 9)
map_Type_Fraction: HashMap < Integer, List < Double >> (Populate the fractions of each type for a particular symptom)
sym_OverallEffect: HashMap < Integer, Double > (Stores overall effect of present symptoms as per male/female knowledge base)
sym_MaleProbabilty: HashMap < Integer, Double >
sym_FemaleProbabilty: HashMap < Integer, Double >
Filterrules: Map < Integer, String >
overallRuleTypeList: ArrayList < WrapProbWithType >
UIDB_SympotmMap: HashMap<Integer, List<Double>>
UIDB_ruleMap1, 2, 3…n: HashMap<Integer, List<Double>>
RuleFillterMap: HashMap<Integer, List<String>>
Instantiate Medical_data: Class as Medical_data md=new Medical_data () (To perform data collection from the knowledge base).
Fetch all the knowledge based data based on rule/categories. (Call fetchSymptomsData (rulename): Method).
Store the data collected for all the symptoms into ruleMap (1,2, 3…n): HashMap<Integer, List<String>>
Populate the Fractions of Each type for a particular symptom store in map_Type_Fraction (getTypeFractionForRules (SympotmMap): Method Call)
Obtain the data for each symptom from the knowledge base and store in SympotmMap: HashMap<Integer, List<String>> (call fetchSymptomsData (symptomsdata): Method)
Iterate over Each Symptom in SympotmMap.keySet () and check If Gendergot=’Male’. If this condition is true populate sym_MaleProbabilty: map with current symptom count and its fraction/probability specific to ender type.
If the above condition is false, populate sym_FemaleProbabilty, map with current symptom count and its fraction/probability specific to gender.
Stage 2: Data collection from end users and filtarization according to rules
Collect the data (answer for the Questionnaire) from End User Form the UI/View
Instantiate FilterDatafilter=newFilterData () to filter the data collected according to the rules formulated in the form of their Matrix, e.g. (1, 1, 1, 0).
Initialize tmplist (1, 2, 3 .. n), UI_list (1, 2, 3,…n), UIDB_ruleMap (1, 2, 3…n).
Add all the answers in the symptoms: ArrayList<String>
Call the filter. filterRules (symptoms) to populate filter Rules: Map<Integer, String>
For each symptom_count: int present in rule Map (1, 2, 3… n). Key set (), fetch the answers based on user selection form filter Rules. Get (symptom_count) Map.
If current answer contains “YES”, Populate UI_list (1, 2, 3,… n), UIDB_ruleMap (1, 2, 3…n), tmpMapForNetEffect (1, 2, 3,…n), If answer is negation/NO don’t consider user input as it is not considered for evaluation process.
Call netEffect (HashMap<Integer, String>hm): double method to populate sym_OverallEffect for each of the rules (1, 2, 3, …, n).
Process UIDB_ruleMap (1, 2, 3, … n) and multiply with corresponding Male/Common or Female specific fraction obtained from the knowledge base.
Stage 3: Final outcome and decision formulation
Instantiate WrapProbWithTypewpt (1, 2, 3, … n)=new WrapProbWithType (), populate wpt.UIDB_ruleMap (1, 2, 3, …, n) for Each of the UIDB_ruleMap (1, 2, 3, …n)
overallRuleTypeList: ArrayList<WrapProbWithType>Where WrapProbWithType is “wrapper class” containing UIDB_ruleMap: HashMap<Integer, List<Double>>
ExpertOutcomeTestFinalOutcomeGot=new ExpertOutcomeTest ();
Where each object of ExpertOutcomeTest contains
MaxElement: Double (represents the maximum probability for type among Type 1, Type-2, Prediabetes, and Gestational)
MaxElement_Type: String (represents the diabetes type for which maximum probability is found)
Outcome_T1, Outcome_T1, Outcome_PD, Outcome_GS: double represents the corresponding probabilities values of each type
Initialize getDecisionMap: HashMap<Integer, Double>indicate percentage value for each of the ‘n=8’Rules satisfied.
Call mfMulitpilerwithType (overallRuleTypeList); method to populate FinalOutcomeGot: object with above values.
Call Final Decision with Weitage. Decide with Percentage Match (ruleFillterMap): method to populate gets Decision Map which provides the information of each rule satisfied percentage.
Algorithm for mf Mulitpiler with Type method
Initialize following variables, lists and maps
RuleCount: int, mfProb1: double (indicate fraction to be considered is for Male/Common group or only female specific),
Instantiate UIDB_ruleListgot ArrayList<Double>, UIDB_ruleListProcessed: ArrayList<WrapRules>(), UIDB_GroupResult: ArrayList<WrapGroups>()
Initialize T1_Cnt, T2_Cnt, PD_Cnt, GS_Cnt as 0 (integer to indicate incremental count for each type belonging to each symptom)
Initialize T1, T2, PD, GT (double values to indicate accumulative percentage for each Type).
Gather wrpt: ArrayList<WrapProbWithType>as method parameter for processing and Iterate over each wp:WrapProbWithType in wrpt and perform following steps
Increment rule count
Get wp.UIDB_ruleMap.size () and store in currentMapSize:Int
Initialize currentMapNegSize: int, currentGroupWeight: Double, listelement: double as 0.
Iterate over each currRule_index: int inwp.UIDB_ruleMap.keySet () and perform following steps
Initialize tmpList: ArrayList<Double>
Set String MF_type=null
Call wp.UIDB_ruleMap.get (currRule_index) and store in UIDB_ruleListgot: (ArrayList
IF sym_MaleProbabilty contains currRule_index, call sym_MaleProbabilty.get (currRule_index) method and store in mfProb1UIDB_ruleListgot, InstantiateWrapRules wrules=new WrapRules () (To hold the Wrap Rules properties). Else call sym_FemaleProbabilty.get (currRule_index) method and store in mfProb1.
For each d: Double IN UIDB_ruleListgot, If D ≥ 0 performs following steps:
Multiply mfProb1 with d and store in weight: Double
Add weight to tmpList as tmpList.add (weight)
Assign list element=weight;
Populate wrules: object as
and add wrules to UIDB_ruleListProcessed as UIDB_ruleListProcessed.add (wrules).
For Each WrapRuleswrr in UIDB_ruleListProcessed iterate overwrr.ruleWeight.size() as outer loop and Iterate over Each WrapRules: wrr in UIDB_ruleListProcessediterate overwrr. ruleWeight. Size () from indexList: int at index 0 till wrr.ruleWeight.get (indexList).
Call wrr.ruleWeight.get (indexList) and store it in wrd:Double
IF indexList=0, Assign T1 as T1+wrd, If wrd>0, Increment T1_Cnt to 1. Similarly calculate T2_Cnt (indexList=1), PD_Cnt (indexList=2), GT _Cnt (indexList=3).
Instantiate, WrapGroups wrg=new WrapGroups () and Populate element of WrapGroups: wrg as
and Add wrg to UIDB_GroupResultasUIDB_GroupResult.add(wrg).
arrfinal: <Double>() (stores sorted order of final outcome values for each type of diabetes)
finalOutcome: HashMap<String,Double>() (Holds finalOutcome to be returned).
For each element wrpg: WrapGroups In UIDB_GroupResult
finalOutcome.put (“T1”, wrpg.AV_T1)
finalOutcome.put (“PD”, wrpg.AV_PD)
Add each element d: double in finalOutcome.values () to arrfinal list and Call Collections.sort (arrfinal) method to sort the arrfinal in ascending order
Instantiate ExpertOutcomeTestas exOutcome: object to wrap final outcome result holding following properties
In the above algorithm, bold letter indicates the variable assignment.
The objective of the proposed methodology is to assist the well experienced physicians in the diagnosis of diabetes disease with more speed and accuracy. This expert system will also assist any novice practitioner to diagnose a diabetes disease, in case any well experienced physicians is not available.
This rule-based expert system diagnoses the disease even with uncertain, imprecise information provided by the patient to the physicians. Disease diagnosis means summation of experience of the physicians treating the patient in the real world and experience of the physicians by expert learning approach of Artificial Intelligence (AI). By experience of the physicians treating the patient in real world means physicians own data, facts, and statistics which are stored in his own brain.
Results and Discussion of Proposed Methodology
The work was implemented on i3 processor with 2.30 GHz speed, 2 GB RAM, 320 GB external storage and software used JDK 1.8 (Java Development Kit) with Struts Framework ( MVC Platform), Mysql Database acted as repository for Knowledge base, NetBeans 8.0.2 IDE with Glassfish (version-4.8) application server provides a comprehensive platform to accommodate overall architecture and XAMPP control panel with admin control provided central point of control to manage various services such as Database service, application execution.
The proposed expert system has been validated on 5 persons. The above figure depicts patient registration consisting of date of entry, medical record number, patient first name, last name, gender, birth place, father’s/parent’s name, mother name, date of birth, type of patient, Permanent Address, contact number which all are stored in the database. The purpose of the patient registration is that if the same patient visits the same hospital then the diagnosis will be very accurate and speedy because their previous details are stored in to the database.
In the Figure 7, it may be seen that the Patient Information form consists of symptoms of diabetes in Yes/No format. It indicates that registered Patient is further required to answer questions for the symptoms based on which expert system will perform prediction for the diabetic type so physicians may diagnose the patient accurately and speedily.
As may be seen in the Patients Information Data collection form, each form is individualized for a particular patient. The symptoms enumerated in form more or less encompasses majority of the symptoms associated with a diabetic patient. The form has been kept simple in terms of description of symptoms and response to each symptom in Yes/No format for facilitating the understanding and desired response from the patients. This helps in acquiring a comprehensive idea about the patients’ health particularly in relation to the patient’s diabetic condition.
After patient symptoms information is gathered Expert System’s Inference Engine produces the outcome for the patient in form of probability among various types of diabetes.
Figure 8 indicate that the probability of chance of occurrence of diabetes in patient is, Type 1 (22.16), Type 2 (22.45), Pre diabetes (29.75), Gestational (13.91). In Figure 8, it may be seen that the outcome appears with maximum chance of probability of occurrence of the type of diabetes. Figure 9 is the same in pie chart form as stated in Figure 8 for more precisely understanding. Pre diabetes is having highest likelihood to occur in the patient with Probability 29.7%.
Figure 10 indicate that how many percentages all rule have been matched for this particular patient. The physicians won’t entirely rely on the generated output in Figure 8; they also focus on the rule which is matching 100%. The Figure 10 produces matched generated rule is implemented in graphical form below for easy to interpret.
The above Figure 11 indicates if the single patient has given an either correct or incorrect or no response answer for a particular rule (suppose rule-1 expects 3 questions as per knowledge base and patient gives answers for all the symptoms correct that he/she is facing then rule percentage will be 100%.
It is well known that this expert system has been validated on 5 patients. The Figures 6 to 11 demonstrated the result for a single patient, now the result is available for all the remaining 4 patients which have been validated.
The expert system’s Inference Engine produces the outcome for the remaining 4 patients in form of probability among various types of diabetes.
Type 1: 24.1026
Type 2: 23.7137
GS with probability: 29.0562
Type 1: 25.8260
Type 2: 23.2173
PD with probability: 34.7826
Type 1: 23.2074
Type 2: 22.3898
PD with probability: 34.1106
Type 1: 22.7211
Type 2: 21.8957
PD with probability: 33.3404
The result is available for all the patients which have been validated in tabular form rule satisfied/matched with percentage (Table 2).
|Name of rule||Patient 1||Patient 2||Patient 3||Patient 4||Patient 5|
Table 2: Rule matched with percentage for several patients.
The below Figure 12 indicates for the single patient has given an either correct or incorrect or no response answer for a particular rule.
Conclusion and Future Direction
Diabetes is a problem with human body that causes blood sugar levels to rise and stay higher than normal on consistent basis. Diabetes can cause serious health complications including blindness, blood pressure, heart disease, kidney disease and nerve damage, etc. which is quite life threatening.
Several people have died due to diabetes because of no proper information and inaccurate identification of the disease especially in rural areas where medical experts and pathological labs are not easily available.
In this work, for expediting diagnosis of diabetes, a Questionnaires rule based expert system is presented which has a list of questionnaires about the symptoms of both male and female patients based on which the patient is diagnosed. This rule based expert system classifies the type of diabetes into type 1, type 2, pre diabetes, gestational based on the presented symptoms. For the decision support system of diabetes development, an application of fuzzy logic rule base has been presented. Fuzzy logic is well suitable for developing knowledge-based system in medical diseases. There are already several existing questionnaires rule based methodologies, which have been implemented for medical disease diagnosis. Through this proposed system, experienced physicians with their existing medical knowledge will enhance the probability of diagnosing diabetes disease more accurately.
The proposed expert system is very useful for patient as well as for physicians to make a correct diagnosis of the disease. This is developed not for the use of patient directly but useful for physicians to make a correct diagnosis of the disease by determining what type of diabetes the patient’s is suffering from and its highest chance of probability of occurrence. The patient’s will be aware about his problem just by giving the symptoms in yes or no as input to the physicians.
This expert system will be beneficial in rural areas because physicians and pathological labs or laboratory tests are not easily available but the problem with this expert system is that it has been validated only on a particular demographic but not in the rural areas and in other different places so it does not guarantee 100 percentage accuracy. The primary goal of CDS systems is to improve the overall health of the people.
The proposed expert system is designed only for one disease. However the system can be further extended to large databases. Also the methodology can be extended for diagnosis of several other diseases.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this manuscript.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent was obtained from all individual participants included in the study.
The authors would like to thank firstly to all the patient of Bombay Medical Hall, Ranchi, India who gave us information very patiently then Dr. Vinay Kumar Dhandhenia, Diabetologist, Sneha verma Dietitian, Linus ji, and remaining all the staff of Bombay Medical Hall, Ranchi, India who help us to collect and validate the dataset of diabetes patient’s and also to Mr. Abishek (Software Architect, Teqforce solution, Ranchi) who help us in coding portion.
- Sivanandam SN, Deepa SN. Principles of soft computing. Wiley (2nd Edn.) 2014.
- Dilip Kumar C, Sanchita P. Classification techniques for diagnosis of diabetes disease: a review. Int J Biomed Eng Technol 2016; 21.
- Emre C, Kemal P, Salih G, Ahmet A. A new medical decision making system: Least square support vector machine (LSSVM) with Fuzzy Weighting Pre-processing, Exp Sys Appl Elsevier 2007; 32: 409-414.
- Margret AS, Clara MLJ, Jeevitha P, Nandhini RT. Design of a diabetic diagnosis system using rough sets. Cybern Inform Technol 2013; 13.
- Yian SC, Shian ST, Yu JT, Ren JL. An intelligent questionnaire analysis expert system, Exp Sys Appl Elsevier 2009; 36: 2699-2710.
- Rajdeep B, Sugata S. Rule based expert system for diagnosis of neuromuscular disorders. Int J Advanced Networking Appl 2012; 4: 2699-2710.
- Samy SAN, Abu ZAO. An expert system for diagnosing eye diseases using clips. J Theor Appl Inform Technol 2005-2008.
- Soundararajan K, Sureshkumar S, Anusuya C. Diagnostics decision support system for tuberculosis using Fuzzy Logic. Int J Comp Sci Inform Technol Secur 2012; 2.
- Dilip KC, Sanchita P. GMLP NN: A hybrid intelligent system for diabetes disease diagnosis. Int J Intell Sys Appl 2016; 8: 49-59.
- Varinder P. Fuzzy expert system for medical diagnosis. Int J Sci Res Publ 2015; 5.
- Chanchal S, Tejalal C. A web based Fuzzy expert system for epistaxis diagnosis. Int J Comp Sci Inform Technol 2015; 6: 4062-4068.
- Dilip KC, Sanchita P, Joy B. Soft computing approaches for diabetes disease diagnosis: a survey. Int J Appl Eng Res 2014; 9: 11715-11726.
- Annoj PK. Clinical decision support system: risk level prediction of heart disease using weighted Fuzzy rules. J King Saud Univ Comp Inform Sci Elsevier 2012; 24: 27-40.
- Prem PST, Saxena PK. Architecture for medical diagnosis using rule-based technique. First International Conference Interdisciplinary Research Development Thailand 2011; 31.
- Tawfik SZ, Mohammad VM, Yousef ASTT. An expert system for diabetes diagnosis. Am Acad Scholarly Res J 2012; 4.
- Dilip KC, Sanchita P. GAJ48graft DT: A hybrid intelligent system for diabetes disease diagnosis. Int J Bio-Sci Bio-Technol (IJBSBT) 2015; 7: 135-150.
- Humar K, Novruz A. Design of a hybrid system for the diabetes and heart diseases. Exp Sys Appl Elsevier 2008; 35: 82-89.
- Nahla HB, Andrew PB, Mohamed NHB. Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Trans Inform Technol Biomed 2010; 14.
- Piyush M, Singh DBV, Nagendra SR, Shailendra S. Clinical decision support syatem for diabetes disease diagnosis. International Journal of Engineering Research and Appl (IJERA), Int Conf Emerg Trends Mech Electr Eng 2014.
- Mostafa FG, Mohammad SA. A fuzzy classification system based on Ant Colony Optimization for diabetes disease diagnosis. Exp Sys Appl Elsevier 2011; 38: 14650-14659.