Creator:Henk-Jan Lebbink
Function:Scores the relevance of variables in datasets
Input Type:xlsx, discrete
Input from Agent:Dizzy, Data Selection Agent
Output Type:Selection of variables
Output to Agent:Casey, Moku, Rikku
Short Description:

The Lenny agent allows us to calculate the amount of information that two properties or variables have in common. This common or mutual information is of invaluable use when we need to reduce the number of variables for further analysis. In general, we can score thousands of properties of individuals, yet most of these properties will be redundant. Redundancy of variables means that such variables add only a small amount of information that can be used to predict some selected output variable. Said differently, if we want to predict the value of an output variable  we prefer to use an input variable that has most mutual information, that tells us most about the output variable.

The Lenny agent uses Shannon Information theory to calculate the mutual information between input variables and the output variable. In addition, Lenny also calculates Interaction Information between two or more input variables and the output variable. When a set of input variables share more information about the output variable than the summed mutual information between each of the input variables and the output variable, we talk about interaction synergy. This non-linear added value information makes such sets of input variables interesting for further analysis.

Note that the Lenny does not give us a function to predict the values of output variables –as do Moku and Rikku – but instead it gives us the set of variables that has the most information to be used to predict outputs.


© 2015 Alan Turing Institute Almere