## Asking statistical questions in sports analytics

RICARDO VALERDI:

Statistics is a branch of mathematics that deals

with collection, analysis, interpretation, and

presentation of data. The origins of the

word, “statisticum” from ancient Roman

times, refer to the fact that the government would

take data of the population in the terms of census in

order to make policy decisions. So the data were for the

benefit of the state. Therefore, this was

known as “statistics.” The modern statistics

that we use today have been around for about

100 years, as old as baseball. And the logical

place to start is with a question or

set of questions that we want to be answered. So the objectives of

this particular video are to highlight some of the

characteristics of good statistical questions,

differentiate between statistical and

non-statistical questions– I’ll provide examples for both. And then your job is to write

statistical questions as part of your homework assignment. So let’s begin with

the two basic criteria for good statistical questions. The first one is that it has to

refer to a specific population. For example, maybe we

look at baseball players during a season. The second criteria has

to do with variability. And this one is a

little bit trickier. But it has to do

with having more than a few possible answers. For example, if we wanted to

look at the Boston Red Sox, a good statistical

question would look at the variability

of the number of home runs throughout a season. And the number of possible

answers for home runs would range from zero

to a couple of hundred. So this is the important notion

of looking at a question where there will be multiple

possible answers that could be analyzed numerically. So let’s look at a few examples

of statistical questions and determine whether

they are good or not. So looking at the

first question, how does weather affect

field goal percentage at Lambeau Field? So Lambeau Field is where

the Green Bay Packers play. Is there a specific population

being focused on in this case? Yes. It’s the field goal kickers

who play at Lambeau Field. Is there variability

in the answers? Yes, definitely. The answers would

range from 0% to 100% because we’re asking

the question in terms of percentages. OK. So that means that this one is

a good statistical question. It satisfies both tests. The next question–

did Manchester United beat Real Madrid? Well, this one, it turns out,

for the specific population, yes. There is a population

of two teams. So it passes that test. However, it fails

the variability test because the only possible

answers are yes or no. Therefore, this is not a

good statistical question because it fails the

variability test. The next one– do

you like sports? Well, it turns out this does

not pass the specific population test because one person

is not a population. And this just refers

to one person. And it also fails

the variability test because there are only two

possible answers, yes and no. So therefore, this is not a

good statistical question. The next one– how many fans

attended the baseball game at Chase Field on June 8, 2017? Yes, the population is fans

that attended the game. And variability– absolutely. The answer could range

from 0 to the capacity of the stadium, which is

just shy of 50,000 seats. Therefore, this is a good

statistical question. Finally, what sports do

football players like to play? Well, it does have a specific

population, football players. However, I would

say that it does not pass the variability

test because the number of possible sports that

football players like to play– because of the population

we’ve selected, the number of possible

responses would be very limited. You would definitely get

football as one of the answers and probably the

most frequent answer. So you wouldn’t really learn

very much about the population. So I hope this gives

you a sense of what makes a good statistical

question and what does not. Your next task is to

write a series of these to ensure that your

understanding of these two tests, population and

variability, is rock solid.