Rococo's standard ID
Context and Problem Statement¶
We have multiple "standards" when it comes to entity_id
and related ID values. The original intention was to use native Python UUIDs. There have been a lot of errors and discussions around converting those to strings with and without dashes.
We want to be able to generate an ID anywhere (including the browser) and have it be unique, however it only needs to be unique for a given model/object/data-type.
Having the same entity_id for a Person and for a Team is weird, but it's acceptable.
Most of our services use UUID v4 (fully random, not sortable).
Considered Options¶
- Native Python UUIDs
- UUIDv4 string
- UUIDv7 string
- distributed global IDs from here, notably Twitter’s Snowflake, Sonyflake, Baidu UID generator (strings)
- string in Stripe ID format
Decision Outcome¶
Our standard ID is a 32 character string generated by using a Type 4 UUID converted to hex with dashes removed.
Rococo library should provide an API to generate the ID (e.g. a generate_id()
function).
Why:
- This is the easiest to implement at the moment.
- Disk space and performance concerns are negligible in the current production projects. It is acceptable to use non-sequential IDs.
- Rococo doesn't impose constraints on the ID string length and is not expected to do so in the future. So using a different ID is possible if a new project requires it for performance, scalability or other purposes.
- Converting to Hex is an optimization (at least for MySql)
Pros and Cons of the Options¶
Collected from the Slack discussion in the rococo
channel
All reviewed options posess sufficient uniquiness to serve as a unique id which can be generated anywhere.
Native Python UUIDs (uuid.uuid4()
)¶
Pros:
API familiar to all Python developers (and most of our services are written in Python).
Cons:
Not supported natively by some of the database APIs we use, will require additional conversion on the database repository level.
UUIDv4 string¶
Pros:
Simple format, known to most of the developers. (Not much to mention here).
Cons:
UUID v4 are randomly-generated and do not contain any information about the time they are created or the machine that generated them.
As such, they have no locality: writes and reads will occur all over the index (no matter what index data structure you have).
With random identifiers, the working set eventually becomes all of the data and your database will be page faulting continuously on large data sets.
UUIDv7 string¶
Pros:
UUID v7 are monothonically increasing (they use a 48-bit timestamp in milliseconds since the Unix Epoch, filling the rest with random data).
This is to get sortability and thus better indexing / sharding in the database.
The key advantage of UUIDv7’s is that they enable temporal locality, so when your database needs to check uniqueness constraints (for idempotent processing), only a small portion of the index will be hot (recent items).
Cons:
Marginally worse uniqueness (48 bit timestamp against 48 random bit in UUID v4).
Snowflake, Sonyflake, Baidu UID generator (strings)¶
Pros:
Compared to UUIDv4, distributed global ids provide better performance due to the sortability and temporal locality (same as UUID v7).
They also occupy less space on disk - e.g. Snowflake ID occupies 64 bit, while UUID v4 / v7 occupies 128 bit.
Cons:
Slightly more complex to implement, compared to UUID v4 / v7
Stripe ID (string)¶
Pros:
- Easy to debug.
- No dashes = double click selection
Cons:
were not mentioned