Hanabi Competitions

Welcome to Hanabi Competitions! This website is used to organize events where players self-assemble into teams to compete on a set of Hanabi deals. We currently only support play on hanab.live.

How it works

Check the homepage for details of active competitions. We also encourage you to join the Hanabi Central Discord server, where new competitions and competition results are announced (refer to the #hc-* channels there).

In order to compete, follow these easy steps:

Create an account on hanab.live (it's as simple as entering a username and password).
Organize a team of the number of players specified by the competition rules.
For each competition deal, have a player create a table on hanab.live by following the links found on the homepage for that competition. Useful tip.

That's it. Game results are collected automatically. Final competition standings are posted following the conclusion of the competition period.

Scoring

Competition scoring system

Competition scoring is based on the matchpoints system used in bridge. A team gets 2 points for each team they beat, and 1 point for each team they tie. Null results, i.e. when a team plays some, but not all of the competition deals, are not "beaten". However, ranking is done on an individual basis, in order to account for teams that for whatever reason have to change members in the middle of a competitions; regardless, we recommend playing with the same team (and on the same hanab.live accounts) for each game.

Game result ranking

There are two types of game result ranking; each competition will use one of them:

Standard: results are ranked first by game score, and second by the turns score, i.e. the number of turns taken in the game.
Speedrun: results are ranked first by game score, and second by game duration.

Game score is the number of cards successfully played by the end of the game, despite that hanab.live assigns a score of 0 to strikeouts and terminations.

Competition series

A series is a collection of competitions. Each participant in a series is scored according to the sum of fractional matchpoints — the fraction of available matchpoints won in a competition — achieved by that player across eligible competitions. A series may have first-x and/or top-y scoring, meaning only a player's first x and top y competitions count toward this sum. This design aims to give players an opportunity to miss one or more competitions in a series, relieving some scheduling pressure.

There are also special all-time leaderboards, which consider every competition in history matching some characteristic, but use the median score across all of them, scaled up by a function sublinear in the number of played competitions (to prevent camping on a small number of high outlier performances).

Universal competition rules

Teammates may not communicate by any means except allowed game actions.
We explicitly forbid reading into the length of time it takes for a teammate to decide on an action.
Each player in a game must be a distinct human (no bots or solitaire shenanigans).

We do not plan to proctor competition games, so obviously this is based on a code of honour.

Player aliases

It is common for a player to make several different accounts on hanab.live. Although we insist that you use the same account for all your competition play, we recognize that mistakes happen. If you accidentally use an alternate account, contact an administrator, providing the name of your main account, as well the names of all alternate accounts you may have used (indicating clearly which is the main account). We will update your results accordingly.

Table creation parameters

There are two additional parameters that you may wish to set when you create a table using the generated links on the homepage. These are table password and card cycling; if you don't know what the latter is, don't worry about it. They can respectively be set by appending the following snippets to the end of the url:

&password=myurlencodedpassword
&cardCycle=true

Searching across competitions

You can search across the entire set of competition games using arbitrary SQL constituting a WHERE clause. Here are the columns you can constrain:

competition_name
final_rank
fractional_MP
sum_MP
player_name
base_seed_name
seed_matchpoints
replay_URL
site_game_id
score
turns
datetime_game_started
datetime_game_ended
character_name

Competition design philosophy

Competition cadence

We started with a weekly cadence for the competitions, but found it became exhausting to find time for not only playing the deals, but also practising under the ruleset and developing specialized strategies. We now run at a biweekly cadence. All of the admins are happy with this arrangement; however, based on a poll, a not insignificant number of competitors prefer a weekly cadence. This choice is not set in stone, particularly with respect to different competition series.

Scoring system

Ranking strikeouts and terminations

The official Hanabi rules, which have been published in several different forms (not only for different rulebook versions, but also for different localizations), offer surprisingly little guidance into how to score strikeouts. To summarize across all these different versions, a team that reaches three strikes "loses". In online implementations of Hanabi, such as those on Board Game Arena, keldon.net, and more recently hanab.live, this has traditionally been represented as a score of 0. The same is true of terminated games, whose result can be reasonably described in a manner similar to that for strikeouts: namely, "loss".

In my humble opinion, assigning these two game finish states a score of 0 is a rather arbitrary choice for doing what is effectively a coercion into a "finished normally" state. As such, I do not value this choice for use in competitions past what it gives in implementation simplicity, since we're pulling game data from hanab.live.

Ranking games that are heterogenous in finish state obviously requires some creativity. Here are several questions relating to the principle of competition that we considered when deciding on the current policy of awarding the game score that was achieved just prior to strikeout(termination), and the turns score that includes the turn where the third strike was earned(where the game was terminated); consider a competition for a variant with max score 25:

Does a strikeout/termination at 24 game score indicate a better performance than a strikeout at 1 point?
Does a strikeout/termination at 24 game score indicate a better performance than a regular game finish at 1 point?
Does a strikeout/termination at 24 game score and 50 turn score indicate a better performance than a regular game finish at 24 game score and 60 turn score?
Does a strikeout/termination at 24 game score and 50 turn score indicate an equivalent performance to a regular game finish at 24 game score and 50 turn score?
Will it ever be advantageous for a team to intentionally strike out? To intentionally terminate the game?

And our corresponding opinions:

Yeah, duh.
Yes. Getting a score of 1 is so bad as to be a contrived situation.
Yes. The team who finished normally played much less efficiently, and still managed to lose a point.
At the very least, it's close enough that we don't have a strong opinion on which gets ranked higher. Default to the simplest choice, which is treating them on even footing.
There is only one type of situation where it is advantageous for an individual to intentionally strike out or terminate. That situation is when it is predicted with high likelihood that a teammate will a) strike out or b) fail to score a remaining possible point. The latter is virtually impossible, since even at 0 clues, at least one remaining play can be communicated by a positional discard, regardless of the amount of context or true information on any hand. The former is not impossible, but it is highly improbable to affect the competition rankings in a meaningful way; an intentional termination could at best improve the turns score by 2, and the game score for strikeouts is typically low enough that no other games with that game score are recorded, meaning turns score doesn't even factor in.

With all that said, our chosen solution seems to satisfy the best balance of performance measurement fidelity and simplicity. There are several slightly different approaches that are also reasonable, but unless you think you have a very compelling reason why one is better than the status quo, we'd recommend that you not start a discussion on this topic.

Accounting for null results

There's a different type of game result than those discussed in the previous section. This is namely the null result, i.e. when a competing team plays in not all of the deals (a team has to play in at least one deal to be considered as having competed in a competition). We considered several tweaks to the traditional matchpoint system to account for null results, but ultimately decided against each.

Although it is a feel-bad moment to rank last in a deal and receive no matchpoints, same as if you had not competed at all (not to mention that every other team ~~stole your lunch money~~ won matchpoints off you), awarding participatory matchpoints has the following negative consequences:

It incentivizes bad-faith participation, i.e. creating a table with no intention of actually trying to play the game well. Teams who wouldn't have time to play through a deal could create then immediately terminate a game in order to receive some free matchpoints. If this happened to multiple teams, there would be a bizarre race-to-one-above-the-bottom, where they would try to barely out-compete each other.
It inflates each team's fractional matchpoints, rendering it no longer a direct measurement of win rate.

One final consideration is that participation matchpoints are an added complexity. In light of all this, we decided to use the traditional matchpoints system, though with a 2-1 allotment of points rather than the 1-½ allotment used by the ACBL.

Number of turns taken as a score

Our initial ideas for scoring Hanabi Competitions mostly involved some combination of cards left in deck, clues left, and final round turns taken. Treating clues and deck size differently leads to some weird tensions in lategame decision-making, as we learned from competitions held by a certain other Hanabi group that shall remain nameless. Then, we realized that cards in deck and final rounds are essentially the same concept, which can be captured by the concept of virtual cards in deck, e.g. if two final round turns have been taken, there are -2 virtual cards left in deck. Finally, we realized that summing clues and virtual cards in deck was nearly equivalent to the number of turns taken, with two exceptions; in the turn score approach:

the first two strikes don't necessarily cost the team anything;
clues returned to the team by playing terminal cards (i.e. 5s, usually) don't add to the score.

These two differences combine to create some interesting strategic consequences that we consider to be beneficial to competition play. In easier rulesets, a good team will not uncommonly find themselves at the maximum clue count, at which point they may not take a discard action, which is the typical way of advancing the deck when no playable cards are held. This effectively creates a lower bound on the number of turns that must be taken in some deals, which consequently could make it more difficult to distinguish between performances by top teams. However, the aforementioned differences mean that while at the maximum clues: a) a player may intentionally misplay a card in order to advance the deck; b) playing a terminal card can improve a team's standing by denying them a clue, and thus reducing the number of required stall turns.