Six Nations Fantasy: Predicting Player Performance with AI¶
Every year, around mid-January, the first post pops up on my feed about “The Greatest Championship,” “Rugby’s Grand Slam,” or, as most people call it, The Guinness Six Nations.
I first get a great feeling of excitement knowing I will soon be watching five weeks of great rugby all through February and March. This excitement is quickly followed by even greater anticipation for the tournament’s Fantasy Game.
I pride myself in having more (rugby) ball knowledge than my friends and the fantasy game is always a great opportunity to prove it. My friends and I have been playing this game since 2019 — the only reason I remember this so clearly is because every single week that year, our fantasy teams had one thing in common: we picked as many Welsh players as possible (That hasn’t happened much since then).
I normally do quite well and bar one year, I have always managed to at least beat my friends. But this year, I wanted to up my game. I just didn’t want to beat my friends, I wanted to compete in the overall rankings. So, I decided to see if I could combine my knowledge of rugby with that of an AI model to try and predict the optimal squad each week.
If you’re interested in AI, rugby, or just need help picking your fantasy team for the upcoming round, I hope you enjoy this piece.
Understanding the Rules of the Game¶
If you’re new to the Six Nations Fantasy game or just need a refresher, here’s how it works:
General Rules¶
- You Pick 15 players
- Your team can have up to 3 back threes, 2 centres, 1 fly-half, 1 scrum-half, 3 back rows, 2 second-rows, 2 props and 1 hooker
- You can pick one substitute (optional)
- You can only select a maximum of 4 players from the same country
- Each player is assigned a star value based on their reputation and likely performance in the game
- You have 230 stars to build your team
- Pick a captain from your starting 15, his total points are multiplied by 2
- Your sub or as the game calls it your Supersub will have his points multiplied by 3 if he is in fact a substitute and comes on during the game
Points System¶
- Players earn points based on their performances.
- Each player’s points are then added up to create your total team score.
- Displayed here are the various actions that give or in some cases take away points:
Alright, now that we’ve got the rules sorted let’s dive in and see what the AI can do!
Installing packages¶
Like any project, I start by installing and importing all the necessary libraries:
!pip install -Uqq fastai
import pandas as pd
from fastai.tabular.all import *
import warnings
warnings.simplefilter("ignore", category=UserWarning)
import torch, numpy as np
from IPython.core.display import display, HTML
from sklearn.preprocessing import StandardScaler
The Dataset¶
I have created a dataset that contains all of the player stats from last years tournament (2024 edition)
First, I have to import my dataset
df = pd.read_csv("df_fantasy_stats_2024.csv")
I can then have a quick look at it and show you a preview of it’s structure and it’s content
display(df)
Player | Country | Opposition | Home/Away | Season | Week | Try | Assists | Conversion | Penalty | Drop Goal | Metres carried | Defenders Beaten | Offloads | Tackles | Lineouts Steal | Conceded penalty | Yellow Cards | Red Cards | Position | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | A Porter | Ireland | France | Away | 2024 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 7 | 0 | 5 | 0 | 0 | P |
1 | B Aki | Ireland | France | Away | 2024 | 1 | 0 | 1 | 0 | 0 | 0 | 30 | 2 | 1 | 5 | 0 | 0 | 0 | 0 | C |
2 | C Baille | France | Ireland | Home | 2024 | 1 | 0 | 0 | 0 | 0 | 0 | 6 | 3 | 0 | 9 | 0 | 1 | 0 | 0 | P |
3 | C Doris | Ireland | France | Away | 2024 | 1 | 0 | 1 | 0 | 0 | 0 | 37 | 1 | 0 | 8 | 0 | 0 | 0 | 0 | N8 |
4 | C Frawley | Ireland | France | Away | 2024 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | R |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
699 | T Freeman | England | France | Away | 2024 | 5 | 1 | 0 | 0 | 0 | 0 | 73 | 3 | 0 | 5 | 0 | 1 | 0 | 0 | W |
700 | T Ramos | France | England | Home | 2024 | 5 | 0 | 0 | 3 | 4 | 0 | 19 | 5 | 0 | 5 | 0 | 0 | 0 | 0 | FB |
701 | U Atonio | France | England | Home | 2024 | 5 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | 7 | 0 | 2 | 0 | 0 | P |
702 | W Stuart | England | France | Away | 2024 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 1 | 0 | 0 | R |
703 | Y Moefana | France | England | Home | 2024 | 5 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | R |
704 rows × 20 columns
The dataset only has 704 rows and is only based on a single season, this means it can be considered relatively small
Preprocessing, Cleaning, Feature Engineering¶
Thanks to the Fastai library a lot of the preprocessing and cleaning is done for me (we will see this in a moment). This is really useful as it lets me really concentrate on the different features I think might be helpful to my model and not have to do all the heaving lifting manually
Feature engineering¶
I define a few initial features:
- Back vs Forward distinction (since tries give different points to each role)
- Remove substitutes (for now)
- Convert each stat into its fantasy point value
- I use normalization to make sure that a player’s tackle count (which can go into the tens or twenties per game) doesn’t overshadow other stats like tries or conversions. Normalizing prevents any single stat from dominating the model
- Assign team ranking scores based on world rankings
- Compute a “team strength difference”
- Normalize fantasy points to range 0-1 (for training efficiency)
def add_features(df):
# Define forward and back positions
forwards = ["P", "H", "L", "FL", "N8"] # Props, Hookers, Locks, Flankers, Number 8
backs = ["SH", "FH", "C", "W", "FB"] # Scrum-half, Fly-half, Centers, Wings, Fullback
# Classify players as Forward or Back
def classify_position(pos):
if pos in forwards:
return "Forward"
elif pos in backs:
return "Back"
else:
return "Unknown"
df["Player Type"] = df["Position"].apply(classify_position)
# For now, remove rows where "Player Type" is "Unknown" (Which are subs)
df = df[df["Player Type"] != "Unknown"].copy()
# Apply scoring rules
df["TryPoints"] = np.where(df["Player Type"] == "Back", df["Try"] * 10, df["Try"] * 15)
df["AssistPoints"] = df["Assists"] * 4
df["ConversionPoints"] = df["Conversion"] * 2
df["PenaltyPoints"] = df["Penalty"] * 3
df["DropGoalPoints"] = df["Drop Goal"] * 5
df["DefendersBeatenPoints"] = df["Defenders Beaten"] * 2
df["MetresCarriedPoints"] = (df["Metres carried"] // 10) * 1
df["OffloadsPoints"] = df["Offloads"] * 2
df["TacklesPoints"] = df["Tackles"] * 1
df["PenaltyConcededPoints"] = df["Conceded penalty"] * -1
df["YellowCardPoints"] = df["Yellow Cards"] * -5
df["RedCardPoints"] = df["Red Cards"] * -8
# Compute total points from all these features
df["TotalFeaturePoints"] = (
df["TryPoints"] + df["AssistPoints"] + df["ConversionPoints"] +
df["PenaltyPoints"] + df["DropGoalPoints"] + df["DefendersBeatenPoints"] +
df["MetresCarriedPoints"] + df["OffloadsPoints"] + df["TacklesPoints"] +
df["PenaltyConcededPoints"] + df["YellowCardPoints"] + df["RedCardPoints"]
)
# Log-transform TacklesPoints
df['LogTacklesPoints'] = np.log1p(df['TacklesPoints'])
# Assign team world rankings
six_nations_rankings = {
"Ireland": 91.36,
"France": 87.19,
"England": 83.63,
"Scotland": 82.99,
"Italy": 78.67,
"Wales": 73.75
}
# Assign the countrys ranking and their opposition ranking
df['CountryRank'] = df['Country'].map(six_nations_rankings)
df['OppositionRank'] = df['Opposition'].map(six_nations_rankings)
# Compute the teams strength difference (higher = easier game for player)
df['TeamRankDiff'] = df['CountryRank'] - df['OppositionRank']
df['TeamRankDiff_Norm'] = (df['TeamRankDiff'] - df['TeamRankDiff'].min()) / (df['TeamRankDiff'].max() - df['TeamRankDiff'].min()) * 5
# Normalize Fantasy Points using standard deviation
scaler = StandardScaler()
df["Points_Norm"] = scaler.fit_transform(df[["TotalFeaturePoints"]]) # Scale to 0-1
return df, scaler # Return the scaler to inverse transform later
# Apply feature engineering
df, points_scaler = add_features(df)
Training The AI Model¶
Before hopping into training my model I first split my data up into a training set (used to train my model) and a validation set (used to test its performance)
I use FastAI’s RandomSplitter(seed=42)
to ensure reproducibility in my split
splits = RandomSplitter(seed=42)(df)
I then create dataloaders (dls) with the required parameters:
splits=splits
, specifies the indices for training and validation setsprocs=[Categorify, FillMissing, Normalize]
,Categorify
turns categorical values into numerical ones,FillMissing
fills missing values in numerical columns with the median,Normalize
standardizes all numeric columns (this is what most of preprocessing and cleaning is)cat_names=["Position", "Country", "Player Type"]
, These are the categorical independent variablescont_names=["TryPoints", "AssistPoints", "ConversionPoints", ...]
, These are the continuous independent variablesy_names="Points_Norm"
, This is the dependent variabley_block=RegressionBlock()
, Specifies that the model should perform regression, sincePoints_Norm
is a continuous variable rather than a category
# Define the fastai Tabular DataLoader
dls = TabularPandas(
df, splits=splits,
procs=[Categorify, FillMissing, Normalize],
cat_names=["Position", "Country", "Opposition", "Player Type", "Home/Away",],
cont_names=[
"TryPoints", "AssistPoints", "ConversionPoints",
"PenaltyPoints", "DropGoalPoints", "DefendersBeatenPoints",
"MetresCarriedPoints", "OffloadsPoints", "LogTacklesPoints",
"PenaltyConcededPoints", "YellowCardPoints", "RedCardPoints",
"TeamRankDiff_Norm"
],
y_names="Points_Norm", y_block=RegressionBlock(),
).dataloaders(path=".")
I also define a points accuracy metric to evaluate my model’s performance in a more intuitive way.
The points_accuracy
function measures what percentage of the model’s predictions are within 3 points of the actual (real) values
def points_accuracy(preds, targets):
preds = preds.detach().cpu().numpy().flatten()
targets = targets.detach().cpu().numpy().flatten()
# Inverse transform the normalized predictions and targets
preds = points_scaler.inverse_transform(preds.reshape(-1, 1)).flatten()
targets = points_scaler.inverse_transform(targets.reshape(-1, 1)).flatten()
# Compute accuracy within ±3 points, multiplied by 100 for percentage format
return (abs(preds - targets) <= 3).mean().item() * 100
points_acc = AccumMetric(points_accuracy, name="points_acc")
The data and model together form a Learner, which is the core of training in FastAI. To create one, I specify:
dls (DataLoaders)
, the dataset, including training and validation splits.layers=[10, 10]
, the neural network architecture, with two hidden layers, each containing 10 neurons.metrics=[points_acc]
, the custom accuracy metric, which measures the percentage of predictions within ±3 points of the actual value.
learn = tabular_learner(dls, metrics=[points_acc], layers=[10, 10])
Choosing how quickly the AI learns is important — too fast and it can make mistakes, too slow and it never improves – . I use the lr_find()
tool that suggests the best learning rate to train on (speed for learning).
learn.lr_find(suggest_funcs=(slide, valley))
SuggestedLRs(slide=0.0063095735386013985, valley=0.00363078061491251)
The two points are both reasonable choices for a learning rate. I’ll pick a learning rate close to the slide to be safer (0.006) and train for a few epochs
learn.fit(10, lr=0.006) # Train the model
epoch | train_loss | valid_loss | points_acc | time |
---|---|---|---|---|
0 | 0.898104 | 0.703753 | 30.681818 | 00:00 |
1 | 0.583361 | 0.571632 | 35.227273 | 00:00 |
2 | 0.442244 | 0.411101 | 39.772727 | 00:00 |
3 | 0.358218 | 0.280930 | 44.318182 | 00:00 |
4 | 0.297234 | 0.163845 | 53.409091 | 00:00 |
5 | 0.258860 | 0.103790 | 67.045455 | 00:00 |
6 | 0.226888 | 0.070605 | 75.000000 | 00:00 |
7 | 0.202736 | 0.057595 | 82.954545 | 00:00 |
8 | 0.186270 | 0.045291 | 84.090909 | 00:00 |
9 | 0.169167 | 0.037126 | 88.636364 | 00:00 |
The model correctly predicts a player’s fantasy score within ±3 points 88.63% of the time — a good result, especially considering it’s based solely on last year’s stats. I do believe there is room for improvement by maybe incorporating more recent player form, such as data from this year’s matches or even adding contextual factors — like the strength of the next opponent or whether the player is playing at home or away.
Final Predictions & Team Selection¶
I now want to get the model to give me a full team to pick and structure its response so that all I have to do is copy the players directly to my team.
I start by getting my models predictions of how many points each player will score simply using this first line:
preds, _ = learn.get_preds(dl=dls.test_dl(df))
All the rest of the code is used to denormalise the PredictedPoints (from between 0-1 to their real figures), only select players from the official lineups, pick a captain and a supersub and format my output
# Predict Points for Team Selection
preds, _ = learn.get_preds(dl=dls.test_dl(df))
# Denormalise
scaler = StandardScaler()
df["Points_Norm"] = scaler.fit_transform(df[["TotalFeaturePoints"]]) # Standard Scaler applied
df["PredictedPoints"] = scaler.inverse_transform(preds.numpy().reshape(-1, 1)).flatten()
df["Predicted Points"] = df["PredictedPoints"]
# Sort by Predicted Points
df = df.sort_values(by="Predicted Points", ascending=False)
# Only consider players from this week's lineups
squad_df = pd.read_csv("week_1_lineups.csv")
df_filtered = df.merge(squad_df[squad_df["Role"] == "Starters"], on=["Player", "Country"], how="inner")
# Define a position selection order
position_order = ["P", "H", "P", "L", "L", "FL", "FL", "N8", "SH", "FH", "W", "C", "C", "W", "FB"]
# Select best players while ensuring team balance
selected_players = []
selected_names = set()
country_limits = {}
for pos in position_order:
for idx, row in df_filtered.iterrows():
if (
row["Position"] == pos
and row["Player"] not in selected_names
and country_limits.get(row["Country"], 0) < 4 # Max 4 players per nation
):
selected_players.append(row)
selected_names.add(row["Player"])
country_limits[row["Country"]] = country_limits.get(row["Country"], 0) + 1
df_filtered = df_filtered.drop(index=idx)
break
# Select a captain as the player with the highest predicted points
captain = max(selected_players, key=lambda x: x["Predicted Points"])
captain["Role"] = "Captain"
captain_points = captain["Predicted Points"] * 2 # Double captain's points
# Select 1 substitute (SuperSub) from the subs list
subs_list = squad_df[squad_df["Role"] == "Subs"]["Player"].tolist()
sub_candidates = df[df["Player"].isin(subs_list)]
sub_player = sub_candidates.nlargest(1, "Predicted Points").iloc[0] if not sub_candidates.empty else None
# Create final squad
ultimate_team = pd.DataFrame(selected_players, columns=["Player", "Country", "Position", "Predicted Points"])
ultimate_team.index = range(1, 16)
ultimate_team.index.name = "Order"
# Modify the Predicted Points column for SuperSub
if sub_player is not None:
sub_player["Role"] = "SuperSub"
sub_player["Predicted Points"] = sub_player["Predicted Points"] / 2 * 3 # Divide by 2 then multiply by 3
sub_df = pd.DataFrame([sub_player], columns=["Player", "Country", "Position", "Predicted Points"])
sub_df.index = ["SuperSub"]
ultimate_team = pd.concat([ultimate_team, sub_df])
# Add a row for the captain
captain_row = pd.DataFrame([{ "Player": captain["Player"], "Country": captain["Country"], "Position": captain["Position"], "Predicted Points": captain_points}])
captain_row.index = ["Captain"]
ultimate_team = pd.concat([ultimate_team, captain_row])
# Display final squad
display(ultimate_team)
# Calculate total predicted points excluding the captain's original row
total_predicted_points = (
ultimate_team.loc[ultimate_team.index != "Captain", "Predicted Points"].sum()
- captain["Predicted Points"]
+ captain_points
)
# Display total points
print(f"\n🏆 Total Predicted Fantasy Points for the Ultimate Team: {total_predicted_points:.2f} 🏆\n")
Player | Country | Position | Predicted Points | |
---|---|---|---|---|
1 | P Schoeman | Scotland | P | 33.084534 |
2 | G Nicotera | Italy | H | 19.865785 |
3 | Z Fagerson | Scotland | P | 23.312636 |
4 | T Beirne | Ireland | L | 36.103783 |
5 | D Jenkins | Wales | L | 23.153835 |
6 | M Lamaro | Italy | FL | 25.294120 |
7 | F Cros | France | FL | 22.830290 |
8 | B Earl | England | N8 | 49.994165 |
9 | A Mitchell | England | SH | 29.350557 |
10 | P Garbisi | Italy | FH | 34.317686 |
11 | M Ioane | Italy | W | 39.102308 |
12 | O Lawrence | England | C | 34.324909 |
13 | H Jones | Scotland | C | 29.602992 |
14 | J Lowe | Ireland | W | 36.805535 |
15 | T Ramos | France | FB | 37.390402 |
SuperSub | D Sheehan | Ireland | H | 69.338139 |
Captain | B Earl | England | N8 | 99.988330 |
🏆 Total Predicted Fantasy Points for the Ultimate Team: 593.87 🏆
And there it is – my team for week 1 !
Since my model is based only on last year’s tournament, there are naturally some biases. Some players may have been standout performers in 2024 but might not replicate that form this year. Others, like a certain Antoine Dupont, are completely missing — not because of poor form, but simply because he was busy preparing to win Olympic gold.
That said, I think this looks like a solid squad to kick off the tournament. The AI has picked a strong mix of players and some interesting selections like Dan Sheehan as the SuperSub, which could be a game-changer.
If you found this article interesting, useful, or even just a bit amusing, I plan to follow up with updated predictions after the first few rounds of the competition. With fresh data and new features, I’ll see how much better the model can get.
Thank you for reading and I hope to see you on the leaderboards !