What are the pros/cons of uuid vs auto-increment serial for the primary key?
Application Type
Mobile, Reactive, Service

We're migrating a solution from FileMaker into Outsystems, and we previously used all UUID's as primary keys. I know Outsystems can manage the local storage to server storage syncing process such that auto-increment serials can still work, but I'm curious what is recommended as best practice today. Intuitively my instinct is to always make the primary key a UUID such that multiple servers could be accessed at the same time creating records and clustered together. Let's assume we might see 1 million simultaneous users some day, and those users are accessing servers in different AWS regions to optimize latency. 

I'm considering using auto-increment serial for all primary keys for optimal storage size and performance, but then use a UUID for all external API interactions in order to never expose or make the primary key visible to a user. However, it's not clear to me how this approach can scale.

What are the pros/cons of UUID vs auto-increment serial?

Hello Sam Riggleman,

By referring to some articles, I got the information below:

UUID

There are 5 standard UUID formats nowadays. Most of the time, people either choose v4 (random UUID) or v1 (timestamp UUID)

  • Globally unique. e.g. No false positive for finding items using log. Easy for migrating data between systems since collision is only theoretically possible.
  • Stateless, it can be generated on the fly.
  • A sense of security since the malicious users can't guess the ID. However, your security team would always insist that a publicly accessible UUID path does not meet the security standard.
  • Version 1 UUID stores timestamp info, which could be useful sometimes.
  • Not readable.
  • Not naturally sortable according to creation time. Though v1 UUID format contains the timestamp, it encodes the timestamp using little-endian in that the least significant time appears first, which renders the UUID hard to sort according to creation time. People design their own UUID format to fix this and there is also a draft proposal to standardize it.
  • For databases like MySQL, Oracle, which use clustered primary key, version 4 randomly generated UUID will hurt insertion performance if used as the primary key. This is because it requires reordering the rows to place the newly inserted row at the right position inside the clustered index. On the other hand, PostgreSQL uses heap instead of the clustered primary key, thus using UUID as the primary key won't impact PostgreSQL's insertion performance.

Auto Increment Integer/Serial

Using auto-increment integer/serial as the primary key is also quite common and every major database engine provides native support. e.g.

  • MySQL - AUTO_INCREMENT
  • PostgreSQL - SERIAL
  • SQLite - AUTOINCREMENT
  • Readable. This is especially valuable if we expose it externally. Thinking of issue id, obviously, issue-123 is much more readable than issue-b1e92c3b-a44a-4856-9fe3-925444ac4c23.
  • Less space. UUID always occupies 16 bytes. For Auto Increment Integer, when stored as in long format, it occupies 8 bytes. If the table itself has only a few columns, the extra primary key space overhead will become more significant.
  • It can't be used in the distributed system since it's quite likely that different hosts could produce exactly the same number.
  • It can't be generated on the fly. Instead, we must consult the database to figure out the next available primary key. In a distributed system, this often means introducing a separate service to produce this sequential number. And that service becomes a single-point-of-failure (SPOF).
  • Some business data can be exposed since the latest ID could represent the total number of inventories. Attackers can also scan the integer range to explore leakage (though it shouldn't happen if ACL is implemented correctly).

Which one to choose?

As listed above, there are Pros and Cons between the 2 approaches.  95% of the time, the default choice should always be Auto Increment Integer. Why?

Readability, and readability leads to simplicity. Numbers are easy to write, easy to remember and easy to communicate. The primary key is not only used by the system, it's also exposed to the end user (e.g. order #), inspected by the operation engineer, customer support etc...

99.9% of the applications won't reach internet scale and they just consist of several models allowing CRUD operations, containing thousands of records. And doesn't need a distributed system.

Take the classic issue tracking/project management tool as an example. The tool likely will have at most 5 figure projects each containing 5 figure issues. and issue id such as issue/123 is definitely more readable than issue/b1e92c3b-a44a-4856-9fe3-925444ac4c23. In fact, all major issue tracking systems use an integer as the issue id. Jira, Apple's Radar, Google's issue tracker, etc... And most applications are less complex than those issue tracking tools.

There are valid cases of choosing UUID e.g. log entry. But most of the time, using UUID as the primary key is a sign of pre-mature optimization and it's also a choice hard to revert afterward.


Hello Riyas.

Very good analysis. 

For the usual cases, I also prefer autonumber solution.


Regards

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.