Getting started with iomete & DBT

Quick guide on how to connect dbt (getdbt.com) to the iomete lakehouse platform.

In this post we will discuss using dbt with the iomete Data Platform, and show you how to get started by connecting dbt to iomete lakehouse to bootstrap your dbt journey.

What is DBT?

dbt (data build tool) is a command line tool that enables data analysts and engineers to transform data in their warehouses more effectively. It’s extremely good at transforming data that’s already loaded into your warehouse.

dbt in iomete platform.

iomete & DBT

iomete, is a managed lakehouse platform built on two powerful engines, Apache Spark & Iceberg. To enable our customers to leverage DBT as their transformation layer, we built a new DBT adapter. Here are the links to the dbt-iomete github repository and iomete-dbt profile.

Set up and connect iomete

Prerequisites

To use dbt on iomete platform:

  • Having an iomete account. If you do not have an account yet, you can request a demo from the home page.
  • A running lakehouse cluster.

If the above requirements are met, let's see where to find the dbt connection parameters on the iomete platform.

Connection parameters

User name & password
Use your username and password that you used to log into the iomete platform.

Cluster name
Select your Lakehouse where you want DBT to run.

Lakehouses
NOTE: lakehouse name used as cluster_name in dbt profile setup.

Host
Enter the selected lakehouse and show Connection details, and then Copy JDBC URL.

Lakehouse connection details
JDBC: jdbc:hive2://dwh-910848238944.iomete.com/;transportMode=http;ssl=true;httpPath=reporting/cliservice

Extract the host part from the JDBC URL. Hostname is a combination of  account_number with the prefix dwh- and the suffix .iomete.com.

host: dwh-910848238944.iomete.com

Prepare Data

💡
IDEA: Suppose we have two tables and we’ll use these tables to generate a new table or view using DBT.

Let's prepare these two tables to be used by the dbt project.

1. Create external tables for mysql tutorial database. Which will be used to get some aggregation.

CREATE TABLE IF NOT EXISTS employees
USING org.apache.spark.sql.jdbc
OPTIONS (
  url "jdbc:mysql://iomete-tutorial.cetmtjnompsh.eu-central-1.rds.amazonaws.com:3306/employees",
  dbtable "employees.employees",
  driver 'com.mysql.cj.jdbc.Driver',
  user 'tutorial_user',
  password '9tVDVEKp'
);

CREATE TABLE IF NOT EXISTS salaries
USING org.apache.spark.sql.jdbc
OPTIONS (
  url "jdbc:mysql://iomete-tutorial.cetmtjnompsh.eu-central-1.rds.amazonaws.com:3306/employees",
  dbtable "employees.salaries",
  driver 'com.mysql.cj.jdbc.Driver',
  user 'tutorial_user',
  password '9tVDVEKp'
);

2. Check data from sql editor.

Our data is ready to use in DBT models to create new tables and views. See below to the DBT model section.

Build your first project

A project is needed to run our DBT activities. We can easily create our new project by following the commands below.

  1. Create a dbt-samples directory.
cd Documents/
mkdir dbt-samples
cd dbt-samples

2. Install dbt-iomete in virtual python environment.

virtualenv .env
source .env/bin/activate
pip install dbt-iomete

3. Initiate the dbt_project project using the init command.

dbt init dbt_project

4. Show the new added profile configuration.

cat ~/.dbt/profiles.yml

5. Navigate into your project's directory.

cd dbt_project

6. Run the debug command from your project to confirm that you can successfully connect.

dbt debug
A successful dbt debug command

Build your first models

A model is a select statement. Models are defined in .sql files typically in your models directory.

  • Each .sql file contains one model / select statement.
  • The name of the file is used as the model name.
💡
REMEMBER: We have two tables employees and salaries. And we want to drive a new table which is going to be join these tables.
  1. Open your project in your favorite code editor.
  2. Create a new SQL file in the models directory, named models/employee_salaries.sql
  3. Paste the following query into the models/employee_salaries.sql file.
SELECT e.emp_no, e.first_name, e.last_name, s.salary
FROM dbt_database.employees e
JOIN dbt_database.salaries s ON e.emp_no = s.emp_no

Note: dbt_project’s default materialization is view and above model going to be create a employee_salaries view.

4. Run the below command from the project's home directory.

dbt run
A successful run with the dbt CLI

5. See the changes on iomete.

employee_salaries view


If you want to create table instead of view, you can change materialization type to table globally from dbt_profile.yml file in dbt_project’s home directory.

models:
  dbt_project:
    example:
      +materialized: table  #<-- here

Or if you only want to make model-specific configuration, you must specify it in the first lines of the model.

{{ config(materialized='table') }}

SELECT e.emp_no, e.first_name, e.last_name, s.salary
FROM dbt_database.employees e
JOIN dbt_database.salaries s ON e.emp_no = s.emp_no

Enter the run command again to execute the new model.

dbt run
A successful run with the dbt CLI

See the changes on iomete.

employee_salaries table

Conclusion

We have covered a quick intro to dbt, and worked through setting up our environment to get dbt connected to iomete Lakehouse.

If you have any questions or feedback related to dbt on the iomete Data Platform, please reach out to us via Slack Channel.

Subscribe to iomete

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe