 ## Asking statistical questions in sports analytics

RICARDO VALERDI:
Statistics is a branch of mathematics that deals
with collection, analysis, interpretation, and
presentation of data. The origins of the
word, “statisticum” from ancient Roman
times, refer to the fact that the government would
take data of the population in the terms of census in
order to make policy decisions. So the data were for the
benefit of the state. Therefore, this was
known as “statistics.” The modern statistics
that we use today have been around for about
100 years, as old as baseball. And the logical
place to start is with a question or
set of questions that we want to be answered. So the objectives of
this particular video are to highlight some of the
characteristics of good statistical questions,
differentiate between statistical and
non-statistical questions– I’ll provide examples for both. And then your job is to write
statistical questions as part of your homework assignment. So let’s begin with
the two basic criteria for good statistical questions. The first one is that it has to
refer to a specific population. For example, maybe we
look at baseball players during a season. The second criteria has
to do with variability. And this one is a
little bit trickier. But it has to do
with having more than a few possible answers. For example, if we wanted to
look at the Boston Red Sox, a good statistical
question would look at the variability
of the number of home runs throughout a season. And the number of possible
answers for home runs would range from zero
to a couple of hundred. So this is the important notion
of looking at a question where there will be multiple
possible answers that could be analyzed numerically. So let’s look at a few examples
of statistical questions and determine whether
they are good or not. So looking at the
first question, how does weather affect
field goal percentage at Lambeau Field? So Lambeau Field is where
the Green Bay Packers play. Is there a specific population
being focused on in this case? Yes. It’s the field goal kickers
who play at Lambeau Field. Is there variability
range from 0% to 100% because we’re asking
the question in terms of percentages. OK. So that means that this one is
a good statistical question. It satisfies both tests. The next question–
did Manchester United beat Real Madrid? Well, this one, it turns out,
for the specific population, yes. There is a population
of two teams. So it passes that test. However, it fails
the variability test because the only possible
answers are yes or no. Therefore, this is not a
good statistical question because it fails the
variability test. The next one– do
you like sports? Well, it turns out this does
not pass the specific population test because one person
is not a population. And this just refers
to one person. And it also fails
the variability test because there are only two
possible answers, yes and no. So therefore, this is not a
good statistical question. The next one– how many fans
attended the baseball game at Chase Field on June 8, 2017? Yes, the population is fans
that attended the game. And variability– absolutely. The answer could range
from 0 to the capacity of the stadium, which is
just shy of 50,000 seats. Therefore, this is a good
statistical question. Finally, what sports do
football players like to play? Well, it does have a specific
population, football players. However, I would
say that it does not pass the variability
test because the number of possible sports that
football players like to play– because of the population
we’ve selected, the number of possible
responses would be very limited. You would definitely get
football as one of the answers and probably the
most frequent answer. So you wouldn’t really learn
very much about the population. So I hope this gives
you a sense of what makes a good statistical