MySQL tables – character set and collation

Featured Image

There are two defines in wp-config.php of WordPress website:

define('DB_CHARSET', 'utf8');
define('DB_COLLATE', '');

There are several things which are commonly misunderstood. Names of constants in those defines, might suggest that they are related to the database itself. They are not. They are related to tables within the database.

Database creation is independent from table creation. WordPress does not create a database and does not care about database's default character set and collation, if it can connect to the database.

The value 'utf8' in the first define means, the least restricted character set from 'utf8' family, which is either 'utf8' or 'utf8mb4'.

If you leave defines above unchanged, before an attempt to install your website, it is like telling WordPress to make its own choices, regarding database's tables character set and collation, which are supported by MySQL (depending of MySQL version) and are least limiting.

The following are the things, WordPress analyses to determine its choices, during installation:

  • MySQL's version
  • database's collation ( in wp-config.php )

Based on MySQL's version, WordPress decides, which group of utf8 family to use. There are two, distinguished by their names: utf8 and utf8mb4. Character sets from utf8 group, allow storing of a maximum 3-bytes long characters. Character sets from utf8mb4 group, allow storing of a maximum 4-bytes long characters.

Now, WordPress checks the value of DB_COLLATE define. If empty, it will use the least limiting collation from chosen utf8 family, otherwise, will use the value specified.

Examples

define('DB_CHARSET', 'utf8');
define('DB_COLLATE', '');


If MySQL does not support utf8mb4 (older versions) then tables' character set will be utf8 and collation will be utf8_general_ci. Otherwise, we can expect utf8mb4 and utf8mb4_unicode_520_ci, or utf8mb4_unicode_ci (MySQL version dependent), respectively.

define('DB_CHARSET', 'utf8');
define('DB_COLLATE', 'utf8_polish_ci');


Older MySQL version - utf8 and utf8_polish_ci. Newer MySQL version - utf8mb4 and utf8mb4_polish_ci (_polish_ci suffix is honored )

define('DB_CHARSET', 'cp1250');
define('DB_COLLATE', 'cp1250_polish_ci');


Any MySQL version - cp1250 and cp1250_polish_ci.

define('DB_CHARSET', 'cp1250');
define('DB_COLLATE', 'utf8_general_ci');


Any MySQL version - error (mismatch of character set and collation)

Summary

In most cases, leaving values of defines, explained above, unchanged, is a viable choice. But, if you want tables collation to match the language of your website, you can modify the value of DB_COLLATE define, appropriately (for example - utf8mb4_polish_ci).

Leave a Reply

Links are not allowed in comment field!

Your email address will not be published. Required fields are marked *