The frequency interpretation
The frequency interpretation of probability is fairly self explanatory, in that it defines a probability as a limiting frequency. If we throw a (fair) die often enough, then we will eventually find that each number comes up about a sixth of the total time. The longer we continue to throw the die, the nearer the result will come to the ideal value of 1/6 for each number. The probability of an event is then defined by the relative frequency, as the throws continue to an idealized infinity.
It has to be noted that this is seen as an empirical law – there is no mathematical necessity for a die to behave like this, other than that all dice have so far been observed to do so. This observation also goes for biased dice: if the bias is towards 6, say, then that bias will show as a limiting relative frequency of, say, ¼ as the number of throws go on towards infinity.
The frequency interpretation was developed as an explicitly empirical (British) response to the dominant rationalist (French) interpretation of Laplace in the mid 19th century by Venn and other British mathematicians. 'Empirical' here means that it is based on actual observations of how things behave, while 'rational' means that it is based purely on mathematical and philosophical thinking.
The approach was further developed and popularized in the early 20th century by philosophers attached to the empirical school of philosophy of the Vienna Circle, such as Richard von Mises
In particular, von Mises regarded the law that frequencies will eventually settle down to a definite number in the long run as a natural law of science - the same kind of law as, say, the laws of mechanics. The idealisation that the law makes by referring to continuing the die-throwing to infinity may not sound as if it is based on empirical observation, but it is analogous to the idealisations that are often made in other sciences. Von Mises calls this law the “Urphänomen”, or 'primary phenomenon', which shows the importance that he attaches to it.
While that law was fundamental to the frequency ideas that were due to Venn and his contemporaries, the second important law was an original addition by von Mises.
For a definition of probabilities, strictly speaking, relative frequencies aren't quite enough, because there needs to be an element that represents the uncertainty. A (hypothetical) coin which on the first throw shows heads, and after that always alternates between heads and tails, has the same relative frequency for heads as a normal coin. The probability of getting heads on the first throw is, however, not ½, it is 1. What is needed for a definition of probability is randomness. Von Mises' idea was to make a law out of the fact that, in random (and fair) games of chance there is no way of establishing a system which helps the gambler to guess the next outcome (given knowledge of the previous outcomes). Therefore the alternating coin does not qualify, because we can predict the outcome as long as we know what came before. With a truly random coin however, the probability of heads is always ½, no matter what the pattern of the coin tosses was before. In the unlikely event that we get ten tails in a row, the probability of the eleventh throw being heads is still ½.
There is again no mathematical (or empirical) necessity for tossing coins or other games of chance to behave like this, other than the fact that they have been designed to work in that way (because if they didn't, casinos would loose money). However, von Mises takes this requirement of games of chance as a definition of randomness. A game is defined as random if no gambling system exists that can take advantage of it.
One immediate conclusion from this interpretation of probability is that there exist no probabilities for single cases. Often in everyday life people will use the term 'probability' to refer to cases that can come about only once: the probability that Obama will be the next US president, for example, or the probability that Elvis is still alive. Similarly, if I have just manufactured a biased coin by gluing some lead to one side, the probability of it landing heads is also undefined, because that particular coin has yet to be thrown.
We can develop our arguments about dice and coins to collectives of events that are not governed by games of chance, where there is a potentially infinite number of times a die can be cast. For example, we can collect the data of all 30 year old men in one year and see how many die within, say, ten years. The number of eligible men is not infinite (not even theoretically), but can be treated as such. We can then work out what the probability of any given 30 year old man dying within the next decade.
Here, the problem is of what should make up the relevant collective: men living in Western Europe, for example, will have a significantly lower chance of death than the world average. But restricting the relevant collective to men in one particular country neglects the fact that heavy smokers who never exercise will be more at risk than sporty non-smokers. It also neglects other factors that are known to affect health, such as social class, the quality of the local health authority, and any family history of disease. Narrowing the collective down every time will in one way make the probability assessment more accurate, but the smaller sample sizes also make the assessment less accurate. If we follow the argument to its logical conclusion and list all the relevant factors concerning people's health we may end up with just one person in our 'relevant collective', in which case the probability is undefined. But how do we decide which factors are relevant and which we should ignore?
This "reference class problem" is one that affects all statistical inferences from demographic and similar data, and it is one of the more fundamental issues to keep in mind when interpreting most statistical data. However, it is a particularly philosophical problem for the frequency interpretation, because it seems to lead to the conclusion that often probabilities are not completely objective after all, because they depend on which pieces of information we judge relevant.
Another problem for the frequency interpretation is that there can be single case probabilities, as mentioned above. Many people find this very unintuitive, and it has been argued that it excludes too many interesting cases from probabilistic analysis.