I’ve been asked to make the distinct problems more clear, so here they are at the top:
- How to compute the rolling average for a team, excluding the current week
- How to add columns containing similar stats for the opponent team
Here’s the original text:
I’m learning R to do some armchair analysis of sports. Right now, I’m stuck on a problem where I have a list of every game played in an NFL season, and I’m trying to calculate what the AvgTotalYds of offense was in the weeks leading up to this game. Eventually, I’d like to be able to do an average for the season-to-date, as well as a moving average of the past X periods.
Further complicating it is that I’d like to also get the same info for the opponent leading up to the week in question. I’ve searched a lot for a similar problem, but couldn’t find any solutions.
Below is a sample of the data. The database I was given has some unfortunate column names. ScoreOff actually refers to the total points scored by the team in the TeamName field, whether they were offensive, defensive, or special teams plays. *Def, likewise, refer to the Opponent. Code examples are using a data frame labeled “df2.”
dput(head(df2))
structure(list(Date = structure(c(14126, 14126, 14129, 14129,
14129, 14129), class = "Date"), TeamName = structure(c(21L, 32L,
1L, 2L, 3L, 4L), .Label = c("Arizona Cardinals", "Atlanta Falcons",
"Baltimore Ravens", "Buffalo Bills", "Carolina Panthers", "Chicago Bears",
"Cincinnati Bengals", "Cleveland Browns", "Dallas Cowboys", "Denver Broncos",
"Detroit Lions", "Green Bay Packers", "Houston Texans", "Indianapolis Colts",
"Jacksonville Jaguars", "Kansas City Chiefs", "Miami Dolphins",
"Minnesota Vikings", "New England Patriots", "New Orleans Saints",
"New York Giants", "New York Jets", "Oakland Raiders", "Philadelphia Eagles",
"Pittsburgh Steelers", "San Diego Chargers", "San Francisco 49ers",
"Seattle Seahawks", "St Louis Rams", "Tampa Bay Buccaneers",
"Tennessee Titans", "Washington Redskins"), class = "factor"),
ScoreOff = c(16L, 7L, 23L, 34L, 17L, 34L), FirstDownOff = c(21L,
11L, 18L, 23L, 21L, 13L), ThirdDownPctOff = structure(c(34L,
14L, 20L, 21L, 35L, 16L), .Label = c("0%", "10%", "11%",
"12%", "13%", "14%", "15%", "17%", "18%", "19%", "20%", "21%",
"22%", "23%", "24%", "25%", "27%", "29%", "30%", "31%", "33%",
"35%", "36%", "37%", "38%", "40%", "41%", "42%", "43%", "44%",
"45%", "46%", "47%", "50%", "53%", "54%", "55%", "56%", "57%",
"58%", "59%", "60%", "61%", "62%", "63%", "64%", "65%", "67%",
"69%", "73%", "77%", "8%", "80%", "9%", "92%"), class = "factor"),
RushAttOff = c(32L, 24L, 39L, 42L, 46L, 29L), RushYdsOff = c(154L,
84L, 109L, 318L, 229L, 106L), PassAttOff = c(35L, 27L, 30L,
13L, 29L, 31L), PassCompOff = c(19L, 15L, 19L, 9L, 15L, 20L
), PassYdsOff = c(216L, 133L, 197L, 161L, 129L, 234L), PassIntOff = c(1L,
0L, 0L, 0L, 0L, 0L), FumblesOff = c(0L, 0L, 0L, 0L, 2L, 0L
), SackYdsOff = c(16L, 8L, 21L, 5L, 0L, 2L), PenYdsOff = c(70L,
35L, 40L, 68L, 64L, 14L), TimePossOff = structure(c(348L,
52L, 368L, 175L, 354L, 239L), .Label = c("14:45", "18:15",
"18:27", "19:31", "19:56", "20:11", "20:12", "20:26", "20:48",
"21:03", "21:08", "21:16", "21:26", "21:28", "21:35", "21:44",
"21:45", "21:52", "21:54", "22:03", "22:08", "22:12", "22:16",
"22:25", "22:30", "22:31", "22:33", "22:34", "22:38", "22:39",
"22:53", "22:55", "22:59", "23:09", "23:10", "23:12", "23:15",
"23:23", "23:28", "23:30", "23:33", "23:37", "23:38", "23:42",
"23:43", "23:45", "23:48", "23:49", "23:56", "24:06", "24:13",
"24:17", "24:18", "24:21", "24:33", "24:34", "24:35", "24:41",
"24:43", "24:49", "24:50", "24:54", "24:58", "24:59", "25:01",
"25:02", "25:05", "25:11", "25:14", "25:16", "25:19", "25:25",
"25:29", "25:31", "25:32", "25:34", "25:36", "25:37", "25:38",
"25:40", "25:41", "25:46", "25:47", "25:53", "25:55", "25:57",
"25:58", "26:00", "26:04", "26:09", "26:10", "26:11", "26:12",
"26:13", "26:16", "26:20", "26:27", "26:32", "26:36", "26:37",
"26:38", "26:39", "26:40", "26:41", "26:44", "26:46", "26:49",
"26:53", "26:56", "26:59", "27:01", "27:04", "27:10", "27:12",
"27:13", "27:15", "27:18", "27:20", "27:24", "27:25", "27:26",
"27:27", "27:28", "27:30", "27:32", "27:37", "27:40", "27:44",
"27:46", "27:47", "27:48", "27:50", "27:51", "27:52", "27:53",
"27:55", "27:57", "27:58", "27:59", "28:00", "28:01", "28:03",
"28:05", "28:06", "28:07", "28:13", "28:14", "28:16", "28:17",
"28:18", "28:19", "28:21", "28:22", "28:24", "28:25", "28:28",
"28:29", "28:32", "28:38", "28:40", "28:41", "28:45", "28:47",
"28:49", "28:51", "28:53", "28:55", "28:57", "28:58", "28:59",
"29:00", "29:02", "29:04", "29:05", "29:07", "29:08", "29:11",
"29:13", "29:14", "29:18", "29:19", "29:20", "29:26", "29:27",
"29:29", "29:31", "29:32", "29:33", "29:34", "29:36", "29:37",
"29:38", "29:41", "29:42", "29:43", "29:49", "29:50", "29:55",
"29:56", "29:59", "30:01", "30:04", "30:05", "30:10", "30:11",
"30:17", "30:18", "30:19", "30:22", "30:23", "30:24", "30:26",
"30:27", "30:28", "30:29", "30:31", "30:33", "30:34", "30:40",
"30:41", "30:42", "30:46", "30:47", "30:49", "30:52", "30:53",
"30:55", "30:58", "31:00", "31:01", "31:02", "31:03", "31:05",
"31:07", "31:09", "31:11", "31:13", "31:15", "31:19", "31:20",
"31:22", "31:28", "31:31", "31:32", "31:35", "31:36", "31:38",
"31:39", "31:41", "31:42", "31:43", "31:44", "31:46", "31:47",
"31:53", "31:54", "31:55", "31:57", "31:59", "32:00", "32:01",
"32:02", "32:03", "32:05", "32:07", "32:08", "32:09", "32:10",
"32:12", "32:13", "32:14", "32:16", "32:20", "32:23", "32:28",
"32:30", "32:32", "32:33", "32:34", "32:35", "32:36", "32:40",
"32:42", "32:45", "32:47", "32:48", "32:50", "32:56", "32:59",
"33:01", "33:04", "33:07", "33:11", "33:14", "33:16", "33:19",
"33:20", "33:21", "33:22", "33:23", "33:24", "33:28", "33:33",
"33:40", "33:44", "33:47", "33:48", "33:49", "33:50", "33:51",
"33:56", "34:00", "34:02", "34:03", "34:05", "34:07", "34:13",
"34:14", "34:19", "34:20", "34:22", "34:23", "34:24", "34:26",
"34:28", "34:29", "34:31", "34:35", "34:41", "34:44", "34:46",
"34:49", "34:55", "34:58", "34:59", "35:01", "35:02", "35:06",
"35:10", "35:11", "35:17", "35:19", "35:25", "35:26", "35:27",
"35:39", "35:42", "35:43", "35:47", "35:54", "36:04", "36:11",
"36:12", "36:15", "36:17", "36:18", "36:22", "36:23", "36:27",
"36:30", "36:32", "36:37", "36:45", "36:48", "36:50", "36:51",
"37:01", "37:05", "37:07", "37:21", "37:22", "37:26", "37:27",
"37:29", "37:30", "37:35", "37:42", "37:44", "37:48", "37:52",
"37:57", "38:08", "38:15", "38:16", "38:23", "38:25", "38:32",
"38:34", "38:44", "38:52", "38:57", "39:12", "39:34", "39:48",
"39:49", "40:04", "40:29", "41:33", "41:45", "45:15"), class = "factor"),
PuntAvgOff = c(36.3, 37.9, 45, 38.3, 48.2, 46.6), Opponent = structure(c(32L,
21L, 27L, 11L, 7L, 28L), .Label = c("Arizona Cardinals",
"Atlanta Falcons", "Baltimore Ravens", "Buffalo Bills", "Carolina Panthers",
"Chicago Bears", "Cincinnati Bengals", "Cleveland Browns",
"Dallas Cowboys", "Denver Broncos", "Detroit Lions", "Green Bay Packers",
"Houston Texans", "Indianapolis Colts", "Jacksonville Jaguars",
"Kansas City Chiefs", "Miami Dolphins", "Minnesota Vikings",
"New England Patriots", "New Orleans Saints", "New York Giants",
"New York Jets", "Oakland Raiders", "Philadelphia Eagles",
"Pittsburgh Steelers", "San Diego Chargers", "San Francisco 49ers",
"Seattle Seahawks", "St Louis Rams", "Tampa Bay Buccaneers",
"Tennessee Titans", "Washington Redskins"), class = "factor"),
ScoreDef = c(7L, 16L, 13L, 21L, 10L, 10L), FirstDownDef = c(11L,
21L, 13L, 21L, 8L, 16L), ThirdDownPctDef = structure(c(14L,
34L, 25L, 13L, 7L, 10L), .Label = c("0%", "10%", "11%", "12%",
"13%", "14%", "15%", "17%", "18%", "19%", "20%", "21%", "22%",
"23%", "24%", "25%", "27%", "29%", "30%", "31%", "33%", "35%",
"36%", "37%", "38%", "40%", "41%", "42%", "43%", "44%", "45%",
"46%", "47%", "50%", "53%", "54%", "55%", "56%", "57%", "58%",
"59%", "60%", "61%", "62%", "63%", "64%", "65%", "67%", "69%",
"73%", "77%", "8%", "80%", "9%", "92%"), class = "factor"),
RushAttDef = c(24L, 32L, 20L, 21L, 23L, 21L), RushYdsDef = c(84L,
154L, 108L, 62L, 65L, 85L), PassAttDef = c(27L, 35L, 20L,
33L, 25L, 41L), PassCompDef = c(15L, 19L, 14L, 24L, 10L,
17L), PassYdsDef = c(133L, 216L, 195L, 262L, 99L, 190L),
PassIntDef = c(0L, 1L, 1L, 1L, 1L, 1L), FumblesDef = c(0L,
0L, 4L, 0L, 1L, 1L), SackYdsDef = c(8L, 16L, 12L, 16L, 10L,
23L), PenYdsDef = c(35L, 70L, 20L, 30L, 40L, 30L), TimePossDef = structure(c(52L,
348L, 32L, 225L, 46L, 161L), .Label = c("14:45", "18:15",
"18:27", "19:31", "19:56", "20:11", "20:12", "20:26", "20:48",
"21:03", "21:08", "21:16", "21:26", "21:28", "21:35", "21:44",
"21:45", "21:52", "21:54", "22:03", "22:08", "22:12", "22:16",
"22:25", "22:30", "22:31", "22:33", "22:34", "22:38", "22:39",
"22:53", "22:55", "22:59", "23:09", "23:10", "23:12", "23:15",
"23:23", "23:28", "23:30", "23:33", "23:37", "23:38", "23:42",
"23:43", "23:45", "23:48", "23:49", "23:56", "24:06", "24:13",
"24:17", "24:18", "24:21", "24:33", "24:34", "24:35", "24:41",
"24:43", "24:49", "24:50", "24:54", "24:58", "24:59", "25:01",
"25:02", "25:05", "25:11", "25:14", "25:16", "25:19", "25:25",
"25:29", "25:31", "25:32", "25:34", "25:36", "25:37", "25:38",
"25:40", "25:41", "25:46", "25:47", "25:53", "25:55", "25:57",
"25:58", "26:00", "26:04", "26:09", "26:10", "26:11", "26:12",
"26:13", "26:16", "26:20", "26:27", "26:32", "26:36", "26:37",
"26:38", "26:39", "26:40", "26:41", "26:44", "26:46", "26:49",
"26:53", "26:56", "26:59", "27:01", "27:04", "27:10", "27:12",
"27:13", "27:15", "27:18", "27:20", "27:24", "27:25", "27:26",
"27:27", "27:28", "27:30", "27:32", "27:37", "27:40", "27:44",
"27:46", "27:47", "27:48", "27:50", "27:51", "27:52", "27:53",
"27:55", "27:57", "27:58", "27:59", "28:00", "28:01", "28:03",
"28:05", "28:06", "28:07", "28:13", "28:14", "28:16", "28:17",
"28:18", "28:19", "28:21", "28:22", "28:24", "28:25", "28:28",
"28:29", "28:32", "28:38", "28:40", "28:41", "28:45", "28:47",
"28:49", "28:51", "28:53", "28:55", "28:57", "28:58", "28:59",
"29:00", "29:02", "29:05", "29:07", "29:08", "29:11", "29:13",
"29:14", "29:18", "29:19", "29:20", "29:26", "29:27", "29:29",
"29:31", "29:32", "29:33", "29:34", "29:36", "29:37", "29:38",
"29:41", "29:42", "29:43", "29:49", "29:50", "29:55", "29:56",
"29:59", "30:01", "30:04", "30:05", "30:10", "30:11", "30:17",
"30:18", "30:19", "30:22", "30:23", "30:24", "30:26", "30:27",
"30:28", "30:29", "30:31", "30:33", "30:34", "30:40", "30:41",
"30:42", "30:46", "30:47", "30:49", "30:52", "30:53", "30:55",
"30:56", "30:58", "31:00", "31:01", "31:02", "31:03", "31:05",
"31:07", "31:09", "31:11", "31:13", "31:15", "31:19", "31:20",
"31:22", "31:28", "31:31", "31:32", "31:35", "31:36", "31:38",
"31:39", "31:41", "31:42", "31:43", "31:44", "31:46", "31:47",
"31:53", "31:54", "31:55", "31:57", "31:59", "32:00", "32:01",
"32:02", "32:03", "32:05", "32:07", "32:08", "32:09", "32:10",
"32:12", "32:13", "32:14", "32:16", "32:20", "32:23", "32:28",
"32:30", "32:32", "32:33", "32:34", "32:35", "32:36", "32:40",
"32:42", "32:45", "32:47", "32:48", "32:50", "32:56", "32:59",
"33:01", "33:04", "33:07", "33:11", "33:14", "33:16", "33:19",
"33:20", "33:21", "33:22", "33:23", "33:24", "33:28", "33:33",
"33:40", "33:44", "33:47", "33:48", "33:49", "33:50", "33:51",
"33:56", "34:00", "34:02", "34:03", "34:05", "34:07", "34:13",
"34:14", "34:19", "34:20", "34:22", "34:23", "34:24", "34:26",
"34:28", "34:29", "34:31", "34:35", "34:41", "34:44", "34:46",
"34:49", "34:55", "34:58", "34:59", "35:01", "35:02", "35:06",
"35:10", "35:11", "35:17", "35:19", "35:25", "35:26", "35:27",
"35:39", "35:42", "35:43", "35:47", "35:54", "36:04", "36:11",
"36:12", "36:15", "36:17", "36:18", "36:22", "36:23", "36:27",
"36:30", "36:32", "36:37", "36:45", "36:48", "36:50", "36:51",
"37:01", "37:05", "37:07", "37:21", "37:22", "37:26", "37:27",
"37:29", "37:30", "37:35", "37:42", "37:44", "37:48", "37:52",
"37:57", "38:08", "38:15", "38:16", "38:23", "38:25", "38:32",
"38:34", "38:44", "38:52", "38:57", "39:12", "39:34", "39:48",
"39:49", "40:04", "40:29", "41:33", "41:45", "45:15"), class = "factor"),
Site = structure(c(1L, 3L, 3L, 1L, 1L, 1L), .Label = c("H",
"N", "V"), class = "factor"), Line = c(4.5, -4.5, 2.5, -3,
-2, 1), Totalline = c(41.5, 41.5, 42, 41, 37.5, 38.5), TotalYdsOff = c(370L,
217L, 306L, 479L, 358L, 340L), TotalYdsDef = c(217L, 370L,
303L, 324L, 164L, 275L), ActualLine = c(-9L, 9L, -10L, -13L,
-7L, -24L)), .Names = c("Date", "TeamName", "ScoreOff", "FirstDownOff",
"ThirdDownPctOff", "RushAttOff", "RushYdsOff", "PassAttOff",
"PassCompOff", "PassYdsOff", "PassIntOff", "FumblesOff", "SackYdsOff",
"PenYdsOff", "TimePossOff", "PuntAvgOff", "Opponent", "ScoreDef",
"FirstDownDef", "ThirdDownPctDef", "RushAttDef", "RushYdsDef",
"PassAttDef", "PassCompDef", "PassYdsDef", "PassIntDef", "FumblesDef",
"SackYdsDef", "PenYdsDef", "TimePossDef", "Site", "Line", "Totalline",
"TotalYdsOff", "TotalYdsDef", "ActualLine"), row.names = c(NA,
6L), class = "data.frame")
I added the TotalYds[Off|Def] columns as that was trivial to do. The closest thing to the properly calculating a moving average was accomplished with the zoo and plyr libraries, and the following command:
ddply(df2, .(TeamName), summarise, rollmean(TotalYdsOff, k=4, fill=0, align="right"))
Which almost does what I want, except that it will use the information for the current week in the average.
As far as getting the matching information for the opponent, I was thinking there’d be a way to pull out the same data from the row where “TeamName” and “Date” both match to the current row’s “Opponent” and “Date.” This is because the database has two entries on a given game, one for the home team and one for the away (and *Off and *Def are swapped). Look at lines 1 and 2 in the example data, specifically Date, TeamName, and Opponent and you’ll understand what I’m trying to say.
Any guidance here? I imagine this is relatively trivial for someone with more than a few days tinkering in R, who would know of some function or library that does this. I, however, am only a few days in, and thus am having some trouble.
For now, I ended up creating a function to calculate the season average up to (but not including) a given game and putting the results in a separate vector, then just using cbind() to add it to the data frame:
df <- cbind(df[order(df$TeamName), ], AvgTotalYdsOff = foo)
As you can see, i reordered the df by teamname (secondary would be date, which it was already ordered by) to make sure they match up.
To get the info from the corresponding row (the one for the other team in the game), I did a loop and put everything in a vector, then another cbind():
In the end, I went with the simple, cruder route as I didn’t know of better alternative. Hope this helps someone in the future with a similar problem.