Using PostgreSQL Arrays with Golang

PostgreSQL comes with support for the SQL99 standard Array Types. Used correctly, these can result in simpler schemas and faster queries.

PostgreSQL arrays also translate well into Go. Read on to learn more about array types and the cool things you can do with it, and how to work with them in Go.

Array Types in SQL

In Postgres, you can have a single column contain a bunch of values of the same type, quite similar to arrays in programming languages. Here is how they are represented literally:

test=# SELECT '{1,2,3}'::int[] ; -- an array of integers
  int4
---------
 {1,2,3}
(1 row)

test=# SELECT '{"hello","world"}'::text[] ; -- an array of strings
     text
---------------
 {hello,world}
(1 row)

In Postgres syntax, the part '{1,2,3}' is a plain string, :: is the Postgres typecast operator, and int[] is the type (an array of integers). The expression '{1,2,3}'::int[] reads as “typecast the string {1,2,3} into an array of integers”.

The string notation of curly brackets and comma-separated values is an array value constructor recognized by Postgres.

There is also an alernate syntax, which is less explicit about the element type:

test=# SELECT array[1,2,3];
  array
---------
 {1,2,3}
(1 row)

test=# SELECT array['hello', 'world'];
     array
---------------
 {hello,world}
(1 row)

test=# SELECT array[1, 'world'];
ERROR:  invalid input syntax for integer: "world"
LINE 1: SELECT array[1, 'world'];

You can have arrays of any basic type (like boolean, timestamp, text, and so on). Multi-dimensional arrays are also possible, although we don’t cover them in this blog post.

Here is how you can have an array as a column type in a table:

test=# CREATE TABLE posts (
test(#     title text NOT NULL PRIMARY KEY,
test(#     tags  text[]
test(# );
CREATE TABLE

Like any other column type, the array column type can also be nullable or not, and can have a default value, or not. For example, if you want the default value to be an empty array (array with no elements):

test=# CREATE TABLE posts2 (
test(#     title text NOT NULL PRIMARY KEY,
test(#     tags  text[] NOT NULL DEFAULT '{}'::text[]
test(# );
CREATE TABLE

To insert a row into this table, you can use a plain string as the value for tag:

test=# INSERT INTO posts (title, tags) VALUES
test-# ('Using PostgreSQL Arrays with Golang', '{"postgres","golang"}');
INSERT 0 1

Postgres will cast the string to the column type, letting you omit the explicit cast.

Using Go

The standard Postgres database/sql driver for Go is lib/pq. (There are others, like pgx and go-pg, which we don’t cover here.) Here’s is how you’ll connect to a database using lib/pq:

package main

import (
    "database/sql"
    "log"

    "github.com/lib/pq"
)

func main() {
    db, err := sql.Open("postgres", "host=172.16.2.100 dbname=test")
    if err != nil {
        log.Fatal(err)
    }

    // .. use db here ..

    db.Close()
}

To insert a row that contains an array value, use the pq.Array function like this:

// "ins" is the SQL insert statement
ins := "INSERT INTO posts (title, tags) VALUES ($1, $2)"

// "tags" is the list of tags, as a string slice
tags := []string{"go", "goroutines", "queues"}

// the pq.Array function is the secret sauce
_, err = db.Exec(ins, "Job Queues in Go", pq.Array(tags))

To read a Postgres array value into a Go slice, use:

func getTags(db *sql.DB, title string) (tags []string) {
    // the select query, returning 1 column of array type
    sel := "SELECT tags FROM posts WHERE title=$1"

    // wrap the output parameter in pq.Array for receiving into it
    if err := db.QueryRow(sel, title).Scan(pq.Array(&tags)); err != nil {
        log.Fatal(err)
    }

    return
}

Note that in lib/pq, only slices of certain Go types may be passed to pq.Array(). For more information, see the docs.

Selecting

Let’s see some examples of how to use arrays for reading back information from our blog posts table. Here are the contents of the table which we’re working with:

test=# select * from posts;
                 title                 |            tags
---------------------------------------+----------------------------
 Job Queues in Go                      | {go,goroutines,queues}
 Logical Replication in PostgreSQL 10  | {postgres,replication}
 Writing PostgreSQL Triggers in Go     | {postgres,triggers,go}
 Monitoring MySQL Servers with OpsDash | {mysql,monitoring,opsdash}
(4 rows)

Get all posts and their tag count

The array_length function gives the number of elements in the array. This function can work with multi-dimensional arrays also, so we need so specify the dimension number also (always 1 for one-dimensional arrays):

test=# SELECT title, array_length(tags, 1) FROM posts;
                 title                 | array_length
---------------------------------------+--------------
 Job Queues in Go                      |            3
 Logical Replication in PostgreSQL 10  |            2
 Writing PostgreSQL Triggers in Go     |            3
 Monitoring MySQL Servers with OpsDash |            3
(4 rows)

You can also use the cardinality function to get the same result in this case:

test=# SELECT title, cardinality(tags) FROM posts;
                 title                 | cardinality
---------------------------------------+-------------
 Job Queues in Go                      |           3
 Logical Replication in PostgreSQL 10  |           2
 Writing PostgreSQL Triggers in Go     |           3
 Monitoring MySQL Servers with OpsDash |           3
(4 rows)

Get all posts tagged “postgres”

The ‘<@’ operator can be used to check for array membership. It is used like A <@ B where A and B are arrays of the same type, and returns true if all elements of array A are present in array B. We can use this to get a list of all posts that are tagged “postgres”:

test=# SELECT title, tags FROM posts WHERE '{"postgres"}' <@ tags;
                title                 |          tags
--------------------------------------+------------------------
 Logical Replication in PostgreSQL 10 | {postgres,replication}
 Writing PostgreSQL Triggers in Go    | {postgres,triggers,go}
(2 rows)

The left hand side of the operator has to be an array, so we construct an array with the single element “postgres”.

Doing this from Go goes like this:

// do the query
sel := "SELECT title FROM posts WHERE $1 <@ tags"
tags := []string{"postgres"}
rows, err := db.Query(sel, pq.Array(tags))
if err != nil {
    log.Fatal(err)
}
defer rows.Close()

// iterate over the result and print out the titles
for rows.Next() {
    var title string
    if err := rows.Scan(&title); err != nil {
        log.Fatal(err)
    }
    fmt.Println(title)
}
if err := rows.Err(); err != nil {
    log.Fatal(err)
}

Get all posts tagged “postgres” and “go”

If we specify more than one element for the left hand side array, all those elements need to be present in the right hand side for the operator to report a match. This means that if we need to list all blogs that are tagged both postgres and go, we can do:

test=# SELECT title, tags FROM posts WHERE '{"postgres", "go"}' <@ tags;
               title               |          tags
-----------------------------------+------------------------
 Writing PostgreSQL Triggers in Go | {postgres,triggers,go}
(1 row)

The corresponding change to Go code is simply:

sel := "SELECT title FROM posts WHERE $1 <@ tags"
tags := []string{"postgres", "go"}                 // <-- two elements now
rows, err := db.Query(sel, pq.Array(tags))

Get all posts tagged either “postgres” or “go”

So how do we do an “OR” then? We can use the && operator, which when used as A && B returns true when A and B have one or more elements in common.

test=# SELECT * FROM posts WHERE tags && '{"postgres", "go"}';
                title                 |          tags
--------------------------------------+------------------------
 Job Queues in Go                     | {go,goroutines,queues}
 Logical Replication in PostgreSQL 10 | {postgres,replication}
 Writing PostgreSQL Triggers in Go    | {postgres,triggers,go}
(3 rows)

The Go code changes only in the query:

sel := "SELECT title FROM posts WHERE tags && $1" // <-- only query changes
tags := []string{"postgres", "go"}
rows, err := db.Query(sel, pq.Array(tags))

Get all unique tags

The function unnest unrolls an array into rows. This is best explained with an example:

test=# SELECT title, unnest(tags) FROM posts;
                 title                 |   unnest
---------------------------------------+-------------
 Job Queues in Go                      | go
 Job Queues in Go                      | goroutines
 Job Queues in Go                      | queues
 Logical Replication in PostgreSQL 10  | postgres
 Logical Replication in PostgreSQL 10  | replication
 Writing PostgreSQL Triggers in Go     | postgres
 Writing PostgreSQL Triggers in Go     | triggers
 Writing PostgreSQL Triggers in Go     | go
 Monitoring MySQL Servers with OpsDash | mysql
 Monitoring MySQL Servers with OpsDash | monitoring
 Monitoring MySQL Servers with OpsDash | opsdash
(11 rows)

As you can see, each tag from each row in “posts” expands into its own row, duplicating the other columns (“title” here) as required. You can use unnest to get only the unique tags:

test=# SELECT DISTINCT unnest(tags) FROM posts;
   unnest
-------------
 go
 postgres
 monitoring
 replication
 opsdash
 goroutines
 triggers
 queues
 mysql
(9 rows)

And since we’re here, let’s see one more trick: rolling up those rows back into an array:

test=# SELECT array(SELECT DISTINCT unnest(tags) FROM posts);
                                     array
-------------------------------------------------------------------------------
 {go,postgres,monitoring,replication,opsdash,goroutines,triggers,queues,mysql}
(1 row)

Can we get the information required to build a tag cloud? A tag cloud needs each tag, and the number of posts that are tagged with it. In SQL terms, we basically need to do a count over a grouping by tag. We can do this using the previous query as a CTE:

test=# WITH p AS (SELECT title, unnest(tags) AS tag FROM posts)
test-# SELECT tag, count(tag) FROM p GROUP BY tag;
     tag     | count
-------------+-------
 go          |     2
 postgres    |     2
 monitoring  |     1
 replication |     1
 opsdash     |     1
 goroutines  |     1
 triggers    |     1
 queues      |     1
 mysql       |     1
(9 rows)

Updates

Let’s see some ways to modify these array values.

Mark all existing posts as legacy

The array_append function returns a new array with the given elements appended to it. We can use this to add another tag to each of our post:

UPDATE posts SET tags = array_append(tags, 'legacy');

Rename tags

You can also search and replace elements in an array using the array_replace function. If we wanted to rename the tag “go” to “golang” in all posts, we can do:

test=# UPDATE posts SET tags = array_replace(tags, 'go', 'golang');
UPDATE 4

Note that this updated all 4 rows, even though only 2 rows contained “go”. The other rows were touched, even though their values did not change. This was because we didn’t have a WHERE clause for the UPDATE. Let’s update only the required ones:

test=# UPDATE posts
test-# SET    tags = array_replace(tags, 'go', 'golang')
test-# WHERE  '{"go"}' <@ tags;
UPDATE 2

That updates only 2 rows, as required.

Other Array Tips and Tricks

Here are a couple more things about arrays that’s useful to know:

NULL values in an array

Array elements can be NULL, and there is no way to declare an array as containing only “not null” elements. It’s a bit of a pain to work with NULLs in arrays, and is best avoided.

UPDATE posts SET tags = '{"bug",null}' WHERE title = 'The Metamorphosis';

From Go, here is how to read arrays that may contain NULLs:

func getTagsMaybeNull(db *sql.DB, title string) (tags []sql.NullString) {
	sel := "SELECT tags FROM posts WHERE title=$1"
	if err := db.QueryRow(sel, title).Scan(pq.Array(&tags)); err != nil {
		log.Fatal(err)
	}
	return
}

You’ll have to check each element of tags to see if it is .Valid while processing the result.

Using “ANY” instead of “IN”

Here are two ways of selecting posts written by any of a specific set of authors:

-- using IN
SELECT title FROM posts WHERE author IN ('a', 'b', 'c');

-- using arrays and ANY
SELECT title FROM posts WHERE author = any('{"a", "b", "c"}');

Assuming that the set of authors is known only at run-time (say selected from the UI from a list), how would you form this query in Go?

With the first query, you’d have to escape each value and form the SQL string, which would be inelegant and error-prone. The second query, however, can be written like this:

func getTitles(authors []string) {
    q := "SELECT title FROM posts WHERE author = any($1)"
    rows, err := db.Exec(q, pq.Array(authors))
    // ...
}

Array elements as foreign keys

Having an array of primary keys that reference another table would actually be pretty helpful in some cases, but Postgres does not support it. You can store the keys as array elements, but you’ll have to “manually” update the array when the referenced rows in the slave table are deleted.

Array Aggregates

The array_agg function can be used in aggregate queries to create arrays out of aggregated input. See the docs for more info.

Multi-dimensional Arrays

Arrays can have more than one dimension. See this page for an overview, and check out other functions here that would be useful in working with them.

Indexing Arrays

Indexes can be created using array expressions. Not all operators (like ‘<@’, ‘&&’ etc) are support by all index types. See here for more info.

Monitoring Your PostgreSQL servers

Since you’re here, you might be interested in our server and database monitoring product OpsDash. It’s written entirely in Golang and can be used to monitor servers, services, app metrics, and databases, including PostgreSQL.

With OpsDash, you can quickly start monitoring your PostgreSQL servers, and get instant insight into key performance and health metrics including WAL archiving and streaming replication.

Additionally, each OpsDash Smart Agent includes the industry-standard statsd interface (and even a graphite interface) to easily report custom metrics.

New Here?

OpsDash is a server monitoring, service monitoring, and database monitoring solution for monitoring Docker, MySQL, PostgreSQL, MongoDB, memcache, Redis, Apache, Nginx, Elasticsearch and more. It provides intelligent, customizable dashboards and rule-based alerting via email, HipChat, Slack, PagerDuty, OpsGenie, VictorOps and Webhooks. Send in your custom metrics with StatsD and Graphite interfaces built into each agent.