# Measure data predictability with Shannon Entropy

Entropy measures the rate at which information is produced by a source of data. It can be used to detect whether data is likely to be structured or unstructured.

## Definition

Given a discrete random variable $X$, which takes values in the alphabet $\mathcal X$ and is distributed according to $p:\mathcal X\to [0,1]$, entropy is defined as:

$\Eta(X):=-\sum_{x\in X}p(x)\log p(x)$

The unit of $\Eta(X)$ depends on what base was used for the $\log$ operation:

- Base 2: bits or “shannons”
- Base $e$: nats (for natural units)
- Base 10: “dits”, “bands”, or “hartleys”

This article is a work in progress.