A neuro-fuzzy inference system for stakeholder classification

Stakeholder classification is carried out manually using methods such as brainstorming, interviews with experts, and checklists. These methods present a subjective character as they depend on the appreciation of the interviewees. This characteristic affects the accuracy of this classification, making that the project managers do not make the correct decisions. The research aims to suggest a fuzzy inference system for the classification of stakeholders, which will improve the quality of such classification in the projects. The proposal carries out the machine learning and the adjustment of the fuzzy inference system to classify the stakeholders by executing four algorithms based on artificial neural networks: ANFIS, HYFIS, FS.HGD, and FIR.DM. It analyzes the results of applying them in 10 iterations by calculating the measures: percentage of correct classifications, false-positive cases, false-negative cases, and mean square error. The ANFIS system show the best results. The fuzzy inference system for stakeholder classification generated improves the quality of this classification using machine learning, allowing to make better decisions in a project.


INTRODUCTION
The lack of success of a project is related to its stakeholders and their engagement from them in the project's decisions. The CHAOS Report [1] reflects that the number of software projects that do not culminate successfully is significant,and only 29% are considered satisfactory. This study analyzes the elements considered relevant to accomplishing a successful project, and a large part is directly related to stakeholder management [2].
The administration of stakeholders in a project includes the processes necessary to recognize them, evaluate their expectancy and influence on the project and evolve appropriate management strategies to accomplish their effective involvement in decisionmaking. Correct recognition and classification of stakeholders help the project leader to centre on the relationships needed to assure the project's success [3].
The stakeholder classification process is usually fulfilled by the project leaders using methods such as brainstorming, interviewing, experts, and checklists [4]. Several techniques make use of different properties to characterize the stakeholders, and these techniques are carried out manually and subjectively by people linked to the projects.
A way to solve the previous problem is the machine learning application. Machine learning techniques provide informatics tools with an approach to human reasoning through accumulated knowledge and experience [5]. These methods are powerful in environments where the data have unprecise values; they allow the development of low-cost solutions and greater modelling capacity [6,7].
Among these techniques are artificial neural networks (ANN) that reflect a mathematical model made up of many procedural elements organized in levels. Its study aims to use components whose structure and operation allow problem-solving, including classification-related ones. The ANN are excellent as classifiers and can be used where traditional techniques do not work [8].
Insufficiencies in the manual classification of stakeholders affect its accuracy, and the project managers cannot make the best decisions for the project that involve stakeholders. The research objective proposes a neuro-fuzzy system for the classification of stakeholders, which improves the quality of the classification carried out by the project leaders.

RELATED WORKS
As part of the research, a study is made about the process of classification of stakeholders, and the attributes used in said process. Next, the fundamental elements of fuzzy inference systems (FIS) and artificial neural networks are analyzed. Then the application of four algorithms based on ANN in the generation and optimization of fuzzy inference systems is described.

Stakeholders classification
The stakeholder classification process aims to categorize them according to their features, roles, expectations, benefits, and pressure on the project. Once they have recognized and captured their data, the stakeholders are categorized to ensure the project's success. This classification lets the project leader focus on the necessary relationships for the project [9].
There are numerous methods for categorising stakeholders, among which is the Mitchell prominence model [10]. This method describes the classification of stakeholders based on the relation of the three variables: power, legitimacy, and urgency. Power is the capacity of the stakeholder to influence the project; legitimacy mentions the association and the actions of the interested party with the project in terms of prestige or suitability, and urgency refers to the immediate attention to the stakeholder's requirements by the project. According to [11,12,13,14], this technique is most used and debated in this field.
In [10], the power variable is associated with the disposition or possibility of obtaining coercive resources (physical force, weapons), useful resources (technology, money, knowledge, logistics, raw materials) and symbolic resources (prestige, esteem, charisma) that allow an interested party to impose its will on others in the organization. Legitimacy can be measured based on organizational and social legitimacy attributes. The first expresses the attribution of a degree of desirability of the stakeholder's actions at the organizational level and the second at the social level [10].
The urgency variable is defined by possessing two attributes: temporal sensitivity and criticism. The first shows the degree of unacceptability from the interested party in delaying the manager's attention to their claims. The second manifests itself in the importance of stakeholders considering their claims or issues [10].
For each of these defined attributes, the specialists rate the grade of possession of the interested parties. This categorization given by specialists contains imprecisions and vagueness; this problem to be solved in this research with the application of fuzzy inference systems and artificial neural network methods are described in the following sections.

Fuzzy inference systems
A fuzzy inference system emulates the form of human reasoning, allowing it to correctly handles the ambiguity, uncertainty, and vagueness of information. These systems are considered expert systems with approximate reasoning to convert an input vector to a single output based on fuzzy logic [15]. They use a knowledge base articulated in conditional rules and are in charge of operating fuzzy sets. There are three central models of fuzzy inference defined in the research of Mamdani [16], Sugeno [17] and Tsukamoto [18].
The model proposed by Mamdani has been the most commonly used, being considered more intuitive and adjustable to human language, adding to being capable of being transformed into the Sugeno type [19]. The model proposed by Sugeno is better adapted to mathematical analysis and does not need a defuzzification process since each rule has a precise output value, to which an average or weighted sum is applied to obtain the final result [20]. Tsukamoto proposes a model where the end of the defined fuzzy rules is denoted through a fuzzy set. It describes a precise value for each rule, which indicates that it does not perform a defuzzification process [21].
The rules of a fuzzy inference system can be established statically from the knowledge and experience of experts in the analyzed area. This method does not permit the system's adaptation to variations in the company and is subject to the knowledge of the people in the subject. It is suitable to use optimization methods that allow rules to be adjusted automatically according to the development of the application environment.
For the machine learning of fuzzy rules, different techniques are used. One of the strategies focuses on generating a set of initials rules and then refining them. A variant within this approach is the creation of fuzzy rules based on the division of the possible solutions using supervised or unsupervised learning. In this approach, learning based on the application of artificial neural networks has a demonstrated efficacy [22].

Artificial neural networks
Artificial neural networks are computational models that aim to simulate the functioning of the human brain from the development of an architecture that takes the characteristics of the functioning of this organ without actually developing a duplicate of it. ANNs can learn from experience to extend new examples from previous examples. They are used for prediction, data mining, pattern recognition, and adaptive control systems, among other applications [23].
In general, artificial neural networks can be classified in different ways according to their topology, learning method (supervised or unsupervised), types of activation functions, and input values (binary or continuous). Learning is the process where data is provided to the neuron, and it learns to recognize patterns with them. That is why supervised learning has catalogued patterns that serve as an example to the network [24].
Among the adjustable parameters of an ANN are each neuron's activation functions, which may have some restrictions depending on the selected neural network. Another important element to select in the neural network is the weights of each of the inputs. The weights can be selected randomly or following some algorithm; These will be updated as the neural network carries out the training process [25].

Adaptive Neuro-Fuzzy Inference System
In [26], one of the first adaptive network-based hybrid-type neuro-fuzzy models (ANFIS) is introduced. This is a fuzzy inference system of the Sugeno type, which uses a multilayer artificial neural network with Gaussian membership functions.Optimization is performed by adjusting the antecedents' membership parameters of the functions and consequents of the rules.The learning is divided into two stages: modifying the consequents following the least-squares strategy and then modifying the parameters of the antecedents employing the descending gradient.
This technique has five layers of neurons where the process of generation and optimization of the blurred rules are carried out. Each node in layer one receives the numerical values of each attribute calculated in the previous step and calculates the degree of belonging of the received value to the fuzzy set it represents. The membership functions associated with these fuzzy sets must be continuous and derivable in sections to apply the descending gradient during the learning algorithm. The nodes of layer two represent the rules, which are connected to their corresponding antecedents of layer one and obtain the degrees of membership as input. The degree of activation of the associated rule is calculated, applying a T-Norm operator to model the logical conjunction operation.
In layer three, the activation degrees of each of the rules obtained in the preceding layer are normalized. These normalized degrees are multiplied by the individual outputs of each rule, a process that occurs in layer four. The layer five nodes calculate the overall output of the system as the weighted sum of all individual signals to give the stakeholder ranking on the highly prioritized, least prioritized, and non-prioritized scale.

Hybrid Neuro-Fuzzy Inference System
In [27], a hybrid neuro-fuzzy inference system (HYFIS) is proposed to build and optimize fuzzy systems. The proposed model integrates the learning power of neural networks with FIS and provides linguistic meanings to connectionist architectures. It represents a five-layer neural network that is functionally equivalent to a fuzzy inference system with Mamdani-type rules. It allows adapting the membership functions of the fuzzy sets and the rules according to the training cases.
Fuzzy rules are optimized using a hybrid learning scheme comprising two phases: generating rules from the data and adjusting the rules by backward propagation. First, the rules base is structured using the knowledge acquisition module. In the second phase, the parameters of the membership functions are adjusted to achieve an adequate level of performance. An advantage of this approach is the easinessof modifying the fuzzy rule base as new data becomes available. When a new training case is available, a rule is created for it and added to the fuzzy rule base.

Fuzzy inference system based on heuristics and the descending gradient method
In [28], a hybrid method is presented to refine the rules of a fuzzy inference system, FS.HGD. That method allows determining and adjusting the coefficients of the polynomial that form the consequent of the inference rules of the FIS type Sugeno. The heuristic method determines the coefficients by averaging the expected output of each training case with the degree of compatibility of the input and the inference rule analyzed. The main advantage of this method is its simplicity since the determination of the polynomial coefficients is not performed using an iterative procedure, a helpful element if there is not enough time for computational processing.
The descending gradient method provides an iterative way to update the polynomial coefficients of each inference rule. It is measured as the root mean square error between the expected and the obtained output from each training set. The coefficient variation is made from its previous value and the product between the learning coefficient and the derivative of the mean squared error (delta rule). Setting a significant learning coefficient can cause the method not to converge to the solution; conversely, the coefficient determination process may require many iterations. The hybrid method proposes determining the initial coefficients through the heuristic method and updating them using the descending gradient method.

Fuzzy inference system based on the descending gradient method
In [29], an algorithm for learning fuzzy inference rules is proposed using a descent method (FIR.DM). The inference rules that express the relationship of the data are automatically obtained from the input-output data. The membership functions in the antecedent part and the actual number in the consequent part of the inference rules are adjusted employing the descent method.
The input values are converted to fuzzy sets in the recognition module, specifying their degrees of belonging.Then the intensity of the shot for each rule is calculated by the product of the degrees of belonging of the antecedents that make up each rule. Finally, the output is obtained by averaging the weights of each rule and its firing intensity.
The training process consists of optimising the fuzzy system parameters iteratively from the values calculated by the system and those desired as a result of an input to the system. The initial conditions that this methodology requires are linearly spaced fuzzy sets, bases between adjacent sets overlapping each other and the fuzzy rulers' initial weights at 0.5.

NEURO-FUZZY INFERENCE SYSTEM FOR STAKEHOLDER CLASSIFICATION
Below, the development environment used for learning the stakeholder classification system is described. Next, the parameters of the algorithms and the characteristics of the dataset used in the process are shown.

Working environment and algorithms parameters
The relational database management system PostgreSQL and the R language are used to apply algorithms to adjust the parameters of the fuzzy inference system. R is an environment and programming language with a focus on statistical analysis, being also very popular in the field of data mining.
The integration between R and PostgreSQL is done through the PL/R extension that facilitates the use of R-Cran packages. Among these packages is FRBS, published in [30]. FRBS is based on the concept of fuzzy logic proposed in [15] and represents fuzzy systems to handle various problems by implementing soft computing techniques. The parameters used to learn the fuzzy inference system are shown in Table 1.

Training and test dataset
In the learning process, a dataset of previously classified stakeholders is used. It contains the values of the attributes of 137 interested and their classification offered by experts as Not prioritized, Less prioritized and Highly prioritized, respectively. The attributes of each collected stakeholder coincide with Mitchell's model attributes: coercive power, utilitarian power, normative-social power, organizational legitimacy, social legitimacy, temporal sensitivity and criticality.The dataset used has the following distribution: 62 classified stakeholders of very prioritized (45%), 53 less prioritized (39%) and 22 not prioritized (16%); it does not contain null or out of range values. It is divided randomly into ten different partitions. Each partition has 110 cases (80%) to train and 27 cases (20%) to validate the training by performing ten executions of each algorithm.

RESULTS AND DISCUSSION
The proposed system for the classification of stakeholders was applied in the software development projects of the University of Computer Sciences. The stakeholders of ten projects of the Computerization Department were selected to apply the proposed system and classify them. Previously, the stakeholders were classified by an expert, thus having the actual result of the classification. The stakeholders were divided into the training group and the testing group. The training group was supplied with the proposed system with the classification made by the expert so that the system could learn. Then the learning verification phase is carried out, where the system processes a dataset which does not have a previous classification.The results returned by the test group's system can be compared with the expected results given by the expert.
The cross-validation technique was used to validate the system. The training and test data were randomly selected, and the four classification algorithms were applied to these sets. This process was repeated ten times to compare the results of these ten random iterations for data selection and application of the algorithms.The results obtained with the system execution allow comparing the performance of the generated system for the various algorithms used.
The following metrics are taken into account to validate the training of the neuro-fuzzy system: percentage of correct classifications, number of false negatives, number of false positives and mean square error. Next, the results of each of these metrics are analyzed in the validation of the training.
The percentage of correct classifications (%CC) is the index that specifies the percentage number of stakeholders correctly classified by the system. Figure 1 shows a comparison between all the algorithms implemented for the ten partitions of the data.
The number of false positives (FP) is the index that indicates the number of stakeholders classified in Figure 1. Percentage of correct classifications by algorithms. a higher category than the category in which they belong. This index refers to how many stakeholders have a lower priority than the one determined by the system. Figure 2 comparesall the algorithms implemented for the ten partitions of the data.
The number of false negatives (FN) is the index that refers to the number of stakeholders classified in a lower category than the category in which they actually belong. This index indicates how many stakeholders have a higher priority than the one determined by the system. Figure 3 compares all the algorithms implemented for the ten partitions of the data.
The mean squared error (MSE) is the measure of dispersion that calculates the difference between each classification and the general average. Figure 4 shows a comparison between all the algorithms implemented for the ten data partitions.
The Shapiro-Wilk test is applied to check the normality of data with less than 2000 samples to validate the training.This validation verifies that the data of the metrics analyzed above do not follow a normal distribution. Taking this into account, for each of the metrics analyzed, the non-parametric Friedman test for K related samples is applied. The results showed significant differences between both algorithms, so the Wilcoxon test was applied.
The objective of applying the Wilcoxon test is to group, if possible, the algorithms that do not have significant differences in the same group. Table 2    shows the result of this non-parametric test where the algorithms are grouped ascending, presenting the best in "Group 1" of each metric.
Most of the analyzed metrics show that the adaptive neuro-fuzzy inference system presents better results than other analyzed algorithms based on artificial neural networks. These results agree with what was expressed in [31,32,33], where various techniques for learning fuzzy rules oriented to project management are analyzed. In these works, various techniques with different approaches are compared, such as genetic algorithms, those based on search space partitions, those based on artificial neural networks and case-based systems. Among all the algorithms analyzed in these researches, the ANFIS algorithm obtained better results than the rest. This suggests that the neuro-fuzzy inference system is a suitable strategy to implement in classification problems that use a fuzzy rule-based system.

CONCLUSIONS
The use of machine learning methods for project stakeholder classification increases the accuracy of the result, and these methods adequately handle the uncertainty provided for the information.The ANFIS algorithm implemented in the fuzzy inference system provides better results in stakeholder classification than the other algorithm. The application of artificial neural network algorithms in informatics tools represents a significant contribution to the decisionmaking in the projects.
In future works, the adaptive neuro-fuzzy inference system obtained for the classification of project stakeholders can be compared with the systems presented in [34] and [35]. These systems are used for the same purpose using genetic and clustering algorithms. It could be concluded that when comparing all these systems, the soft computing technique is the most suitable for the task of classifying stakeholders in projects.