L-Jimmy wrote:I posted this in the Captain's poll thread, and am x-posting here .... because it took a while to write.
@Ben Marlin @Rippin and Tearin -----------
Ben Marlin wrote:
Thanks mate
How do you suggest that I could add depth and controls?
Sure, happy to help! I'll try and update this during the day, between meetings. This here is a helpful guide from a good source: https://eml.berkeley.edu/~dromer/papers/Econometrics%20Jan%202018.pdf
Ok, so this is going to be a very short discussion on regression.
You are interested understanding the relationship between a set of observations, and then predicting an outcome.
Each observation consists of a set of variables, which you seem to inherently understand well - i.e. Dude[z] played Minutes[x] against Team[y] in Position[p] in Round[r] with HIAtime[h] ScoringPoints[s]
Another way of saying this is that you have a model of what determines a player's scoring:
S = z + x + y + p + r + h
But that only works for one dude in one game, and it doesn't make any sense even then - because if we give positions a number between 1-17, then the model above says that reserves (14-17) will score higher. So we need to weight these things - in game t we have capital letter weights:
S(t) = D(z) + M(x) + T(y) + P(p) + R(r) + H(h)
but these guesses aren't perfectly correct - because sometimes a Dude will be tired, or really motivated, or have an undisclosed niggle, or play for Des Hasler. We have errors in the model:
S(t) = D(z) + M(x) + T(y) + P(p) + R(r) + H(h) + e(t)
So, if Cook scores 65 against the Dragons with no HIA, in hooker, for 80 minutes, in Round 19:
65 = D(Cookie, or player number 71 [we have to give a number]) + M(80) + T(14 [a number for Dragons]) + P(9) + R(18) + H(0) + e
The next trick is to figure out how big those capital letter weights need to be. We do this, often, by putting all the observations we have into a matrix, and minimising the squared errors in guesses: OLS.
Now, on to your question:
1) the more observations we have, the better our understanding of each Weighted letter will be
1a) this means that a calculation including all available games from Cook, Farah, Teddy, Wighton and BS9 will likely be more informative than just Cook - because we learn more about each Weighted letter with each observation.
2) We also need to think about how many explanatory poitns (eg H, P, M) we should use, as too many makes the calculation meaningless, and too few misses important stuff. The balancing line is a matter of art and science.
Happy to discuss more, but the Romer link is a good place to start. If you're really keen, there's a good uni near you, or lots of great free stuff online (MITx etc).
Cheers, and happy econometrics!
.....