How to set the number of shards is interesting. He recommends 12 as an example because it’s 2*6, 3*4 which means it’s mathematically flexible.
He also suggests to think of splitting a shard as tree node split. It would be look more formal if we think of hashing function in the article for sharding as linear hashing.
It’s also worth mentioning that the Facebook does not use clustering algorithm when they shard data; as they found that no algorithm improves the performance.