Split char-separated values in MySQL

Reading time ~6 minutes

Introduction

Recently I needed to extract a list of URLs from a MySQL table. All those URLs were inserted in a single TEXT column, however they were all separated by semicolons. The situation was more or less like the following:

SELECT * FROM links

---------------------------------------------------------------
| id | urls                                                   |
---------------------------------------------------------------
|  1 | http://first.link;http://second.link                   |
|  2 | https://third.link                                     |
|  3 |                                                        |
|  4 | http://fourth.link;http://fifth.link;https://sixth.link|
| ...| ...                                                    |

I wanted to get those URLs with a single query, without using any programming language. Luckily for me I could make a few assumptions to make my work a lot easier:

  • The column contained only urls or empty strings
  • There were a maximum of 5 urls for each row

Diggin in

MySQL does not include a function to split a delimited string. You can define a function yourself for this purpose. However I didn’t have the required privileges on the database to create such function, so I just sticked with subqueries.

The first thing that I tried was a query to count the number of occurrences of semicolons (and therefore of urls) in the urls column:

SELECT id, ROUND((LENGTH(urls) - LENGTH(REPLACE (urls, ";", ""))) / LENGTH(";")) AS count
FROM links

--------------
| id | count |
-------------
|  1 | 1     |
|  2 | 0     |
|  3 | 0     |
|  4 | 2     |
| ...| ...   |

As you can see, there is a subtle little problem here. The query counts the semicolon occurrences, therefore the row with an empty string and one with a single url have the same value. We need to introduce a condition, for example we can fix the query as follows:

SELECT id, ROUND((LENGTH(urls) - LENGTH(REPLACE (urls, ";", ""))) / LENGTH(";")) AS count
FROM links
WHERE urls <> ""

--------------
| id | count |
-------------
|  1 | 1     |
|  2 | 0     |
|  4 | 2     |
| ...| ...   |

Much better! Now we know that the first row contains one semicolon (two urls), the second row contains zero semicolons (one url), and so on.

Now we need to get the actual data. The general idea is to use the count value to extract, in a separate column, only the portion of string that we are interested in. Remember that we have a maximum of 5 urls? That helps us tremendously. Let’s visualize the idea:

-----------------------------------------------------------------------------
| id  | count | first url | second url | third url | fourth url | fifth url |
-----------------------------------------------------------------------------
| 1   | 1     |           |            |           |            |           |
| 2   | 0     |           |            |           |            |           |
| 4   | 2     |           |            |           |            |           |
| ... | ...   |           |            |           |            |           |

We can compute url columns in this way:

  • first url: from the start of the string until the first ; occurrence
  • second url: from the the first ; occurrence to the second
  • third url: from the the second ; occurrence to the third

For this purpose we have at our disposal the SUBSTRING_INDEX(str,delim,count) function

Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned.

The final query is the following:

SELECT id, intermediate.count,
IF(intermediate.count >= 0, SUBSTRING_INDEX(urls, ';', 1), NULL) AS first,
IF(intermediate.count >= 1, SUBSTRING_INDEX(SUBSTRING_INDEX(urls, ';', 2), ';', -1), NULL) AS second,
IF(intermediate.count >= 2, SUBSTRING_INDEX(SUBSTRING_INDEX(urls, ';', 3), ';', -1), NULL) AS third,
IF(intermediate.count >= 3, SUBSTRING_INDEX(SUBSTRING_INDEX(urls, ';', 4), ';', -1), NULL) AS fourth,
IF(intermediate.count >= 4, SUBSTRING_INDEX(SUBSTRING_INDEX(urls, ';', 5), ';', -1), NULL) AS fifth
FROM
links
JOIN (
    SELECT id,
    ROUND ((LENGTH(urls) - LENGTH( REPLACE ( urls, ";", "") ) ) / LENGTH(";")) AS count
    FROM links
    WHERE urls <> ""
) AS intermediate
ON links.id = intermediate.id

-----------------------------------------------------------------------------------------------
| id | count | first              | second             | third               | fourth | fifth |
-----------------------------------------------------------------------------------------------
|  1 | 1     | http://first.link  | http://second.link | NULL                | NULL   | NULL  |
|  2 | 0     | https://third.link | NULL               | NULL                | NULL   | NULL  |
|  4 | 2     | http://fourth.link | http://fifth.link  | https://sixth.link  | NULL   | NULL  |
| ...| ...   |                    |                    |                     |        |       |

The icing on the cake would be pivoting the table from columns to rows, filtering the NULL ones. Unfortunately MySQL does not have a specific statement allowing to do this, for example as SQLServer does. So the easiest solution is just to export the resulting data in Atom, SublimeText or in your favorite editor, and later with a few keyboard tricks you should finally have your list of urls.

Feature Flags in ProcessWire

Using Feature Flags in ProcessWire to turn functionalities your website off, without deploying new code. Continue reading

Using ProcessWire InputFieldDateTime via API

Published on December 09, 2017