Measure data predictability with Shannon Entropy
Entropy measures the rate at which information is produced by a source of data. It can be used to detect whether data is likely to be structured or unstructured.
Definition
Given a discrete random variable , which takes values in the alphabet and is distributed according to , entropy is defined as:
The unit of depends on what base was used for the operation:
- Base 2: bits or “shannons”
- Base : nats (for natural units)
- Base 10: “dits”, “bands”, or “hartleys”
This article is a work in progress.