21 July 2016

Zipf! An Elegant Formula that Predicts How Often You'll Use Words

On occasion, we stumble across something so elegant that it makes us stop. We suddenly see order in what we once thought of as random activity, hear the voice in the din.

Zipf's law states that within any large body of text, a word's popularity or rank multiplied by the number of times it shows up will result in the same value.

Rank X Number  = constant.

This law holds for the Bible, pop songs, and literature. Here's the law applied to James Joyce's Ulysses:


 That is an absurdly robust prediction. I don't care who you are, that's abundantly cool.

Dataclysm by Christian Rudder is the source for this gem of an idea.

3 comments:

  1. Elegant Formula to predict what I'll write or say? I don't need no stinkin' elegant formula! I'm married!! My wife has been fortelling me for years what I'm going to write or say.

    ReplyDelete
  2. Sorry, but the power law doesn't contain any information about anyone "will" say...Zipf law says that, for any corpus of text sufficiently large, the distribution will be like the one you said: the frequency proportional to (1/rank)^s. But if you already had a corpus (that follow Zipf's law), and you pretend to add some word, you wouldn't be able to predict that word in the name of probability...

    It's surprising indeed. But not a prediction. An interesting short story that follow the Zipf law, and use a language written by chance is The Library of Babel, of Jorge Luis Borges, an Argentinian writer.

    ReplyDelete
  3. Al - I could have predicted that you'd say that.

    Social - you are so right. I played with the title, went with something that I thought was fun ... and then didn't check back on the reality of it.
    Now I need to come up with something more accurate. Oh, and thanks for the additional information.

    ReplyDelete