Metabase: making database content accessible to all
Accessing the content of databases can be a very useful technical diagnostic tool but also a way to drive projects or growth of a comany.
Having a proper BI tool is essential unfortunately, smaller companies don't always have the budget for such a tool making Metabase the ideal solution: it has paid Cloud-based and on-premise offers but also has a free open source on-premise solution. We'll have a look at a few possible setup possibilities and have a quick look on basic usage.
- Official website: [https://www.metabase.com/]
- Documentation: https://www.metabase.com/docs/latest/
- Docker images: https://hub.docker.com/r/metabase/metabase
Setup up Metabase
Metabase itself uses a database to store information so in order to deploy it, we need a MySQL/MariaDB or Postgres database and dedicated user for the Metabase application.
Separately, connection and query rights need to be set up for all the databases we wan't to access through Metabase.
There is nothing stopping us from hosting our Metabase database on a server hosting other databases that will be accessible through Metabase but it could cause a security issue: the DB acccess for Metabase to get database content doesn't need UPDATE or INSERT rights, only CONNECT and SELECT but it still means that anyone accessing the Metabase tables could easily get credentials to access other databases on the same server.
Having a Metabase database on a separate server provides security through asset containment.
Basic option: using an already existing database server
Using an existing SQL server, we'll set up a Metabase database. Then use Metabase to analyse the other database on the same server and other DB servers.
Pre-requisite
A database & user must be created:
PostgreSQL | MySQL |
---|---|
CREATE USER metabaseadmin with password 'mJ5jz9M'; |
CREATE DATABASE metabase; |
In our example below, we will use PSQL with a server using IP 192.168.22.22.
So we'll also update the PG_HBA file accordingly and restart the service/cluster.
For example
# TYPE DATABASE USER ADDRESS METHOD
host metabase metabaseadmin 192.168.1.1/16 md5
host metabase metabaseadmin 192.168.1.1/16 scram-sha-256
This allows our DB user metabaseadmin to connect from any server in the 192.168.XXX.XXX network to the database metabase.
Files and folders configuration & setup
For this demonstration, a sub-directory metabase has been created in /opt containing:
File | docker-compose.yml | File at the root of the directory containing our running image configuration (see below) |
---|---|---|
Folder | metabase-data |
Folder to store Metabase files. Will be created by the container deployment. |
docker-compose.yml
version: "3.8"
services:
metabase:
image: metabase/metabase:v0.47.4
restart: always
container_name: metabase
ports:
- 3000:3000
volumes:
# declare your mount volume /host/dir:/container/dir
- /opt/metabase/metabase-data:/metabase-data
environment:
MB_DB_TYPE: postgres
MB_DB_DBNAME: metabase
MB_DB_PORT: 5432
MB_DB_USER: metabaseadmin
MB_DB_PASS: mJ5jz9Ms
MB_DB_HOST: 192.168.22.22
Once you've saved your docker-compose.yml file, simply run:
cd /opt/metabase
sudo docker-compose up -d
If your database access is properly set up and the firewall for your application server allows the retrieval of the dockerhub image, your Metabase container should deploy.
Because the database content needs to be created, this could take a few minutes.
You can check the statut using:
sudo docker logs metabase -f
Or use a browser to connect to your application server on port 3000 where you will have a status indication:
And once the database is ready, you'll have:
With a linkg to the startup guide for the version deployed (here, v0.47).
You can now start using Metabase by clicking on the Let's get started button.
Major issue with this setup: in this scenario if there is a maintenance on the DB server hosting Metabase's database, the service will not be available.
Better option: using a dedicated database server
If we want to properly segregate the Metabase database and the databases we will be exploring, the easiest way is setting up a dedicated DB server.
Below is an example deploying a PostgreSQL server alongside the Metabase container.
The 2 main differences are:
- our docker-compose will deploy 2 services: postgres-db as well as Metabase.
-
to simplify things, we wil have to add some SQL script to create the SQL database & user for Metabase
Files and folders configuration & setup
For this demonstration, a sub-directory metabaseselfsuffisant has been created in /opt containing:
File | docker-compose.yml | File at the root of the directory containing our running image configuration (see below) |
---|---|---|
Folder | metabase-data |
Folder to store Metabase files. Will be created by the container deployment. |
Folder | configs |
Folder containing the Dockerfile and SQL script used to create Metabase DB & user |
File | Dockerfile | Lists the instructions used to build the custom image |
File | init.sql | File that will be copied to the folder /docker-entrypoint-initdb.d/ and executed on container deployment for automation purposes. Without it, we would need to manually create the Metabase DB & users within the DB container. |
In this scenario, Dockerfile is used for the PSQL service deployment only and will only contain minimal information:
FROM postgres:16.0-bullseye
COPY init.sql /docker-entrypoint-initdb.d/
The init.sql file content is the SQL instruction for user and database creation :
create user metabaseadmin with password 'mJ5jz9Ms';
create database metabase OWNER metabaseadmin;
grant all privileges on database metabase to metabaseadmin;
docker-compose.yml
version: "3.8"
services:
postgres-db:
build:
context: ./configs
dockerfile: Dockerfile
restart: always
container_name: postgres
ports:
- 5432:5432
environment:
POSTGRES_PASSWORD: postgres
PGDATA: /var/lib/postgresql/data
volumes:
# declare your mount volume /host/dir:/container/dir
- /opt/metabaseselfsuffisant/pgdata:/var/lib/postgresql/data
metabase:
image: metabase/metabase:v0.47.4
restart: always
container_name: metabase
ports:
- 3000:3000
volumes:
# declare your mount volume /host/dir:/container/dir
- /opt/metabaseselfsuffisant/metabase-data:/metabase-data
environment:
MB_DB_TYPE: postgres
MB_DB_DBNAME: metabase
MB_DB_PORT: 5432
MB_DB_USER: metabaseadmin
MB_DB_PASS: mJ5jz9Ms
MB_DB_HOST: postgres-db
depends_on:
- postgres-db
links:
- postgres-db
Compared to the previous setup, you can see that for Metabase, we have only updated the DB_HOST value.
With this setup, your Metabase is now fully contained and only needs to be able to connect to databases on other DB servers to fetch data for you to analyse.
If there is a maintenance on one of these database servers, no data can be retrieved but Metabase will still be accessible.
In my experience, we had our Metabase instances connected to at least 4 DB servers per Metabase instance.
Set up DB connect & read for Metabase
Add data on initial startup
Once your Metabase platform is available, you can either add data directly in the initial setup:
Add data after initial startup
You can either use the "Add your own data" button on the home page or go to the Admin pannel in the Databases tab:
Connect new database
It will ask for the usual DB credentials and connection information: database, host, schema, SSH or SSL configuration...
Multiple elements can be configured but we'll stick to the basic here and just explore the DB reader user creation.
Creating DB users allowing Metabase to read database content
PostgreSQL | MySQL |
---|---|
-- CREATE user |
-- Create user & grant privileges for a user connecting from any IP |
For PSQL, don't forget to update the PG_HBA file and restart the service or cluster.
Once this is done, Metabase will be able to query the "database_name" DB.
Note: if your DB structure changes with new tables or new schemas being added, you will need to give your Metabase DB reader user the matching rights before these new element can be seen in the Metabase interface.
My ideal setup: making sure you don’t overload your database server
Metabase has a default query setting where it shouldn't cause DB server overload however those settings can be overwritten.
And as usage goes up, more and more queries will be run and you might end up endangering your production DB servers.
For this setup, the changes aren't needed in the Metabase configuration but in the way the database servers are set up: we will be using reading data from replicas of the databases and not the databases themselves.
Replicas or "Slave" DB servers are read-only which means that any data update/insert or user creation must be done on the Primary or "Master" server.
So once the DB reader users are created on the Primary DB servers, they should be available on all replicas of that server.
I've added Traefik in the schema as a bonus.
To see how to set up DB replication, check the dedicated articles.
Conclusions
You should now be able to deploy Metabase without any problem even if it's only to try it out.
The 3 setups given here are only examples of what can be done. They can of course be improved upon.
I will not be re-writing the Metabase documentation which is very well maintained but will be covering some specific functions and uses in other articles so please check them out.