Sync Data From JDBC Sources

Intro

This is an end-to-end guide about migrating tables from JDBC sources (MySQL, PostgreSQL, etc.) to iomete and display it in the BI dashboard.

First, you need to establish an SSH tunnel between iomete and your database in your private network. See Database Connection Options

Given database to migrate

Let's assume that we want to replicate the MySQL database (or any other supported JDBC database) to the iomete warehouse

In this tutorial, we will be using a publicly accessible iomete-tutorial database instance that contains the Employees Sample Database.
In case of connecting to your own database instance see Database Connection Options for the details‌

Here are the details of iomete-tutorial public database:

Host: iomete-tutorial.cetmtjnompsh.eu-central-1.rds.amazonaws.com
Port: 3306
Username: tutorial_user
Password: 9tVDVEKp

The database contains the following tables:

Table name

Row count

employees

300024

departments

9

dept_manager

24

dept_emp

331603

titles

443308

salaries

2844047


Create warehouse

‌Create a new warehouse instance

Enter a caption for this image (optional)

Querying  Source Table

After having the warehouse created, we create a table using JDBC Sources using CREATE TABLE command. In the OPTIONS part we specify credentials of the database to which we want to connect as follows (see JDBC Sources):

CREATE TABLE IF NOT EXISTS employees_proxy
USING org.apache.spark.sql.jdbc
OPTIONS (
  url "jdbc:mysql://iomete-tutorial.cetmtjnompsh.eu-central-1.rds.amazonaws.com:3306/employees",
  dbtable "employees.employees",
  user 'tutorial_user',
  password '9tVDVEKp'
);

SELECT * FROM employees_proxy limit 100;
This table doesn't hold the actual data. Data will be retrieved from the actual source once we query the table
Query Editor
Enter a caption for this image (optional)

Migrating Data

To move the data from the source to the warehouse, you can use one of the following options:

Option 1. Create a table from select

-- Create table directly from the query
CREATE TABLE employees USING delta 
  AS SELECT  * FROM employees_proxy;

-- To inspect the table use the following query
DESC TABLE EXTENDED employees;

Option 2. Insert into to existing table

--just append data
INSERT INTO employees
   SELECT  * FROM employees_proxy

--or you can use the follwing command to overwerite data

--first clean an existing data and then insert new data
INSERT OVERWRITE TABLE employees
   SELECT  * FROM employees_proxy

Option 3. Merge with existing data

MERGE INTO employees
 USING (SELECT  * FROM employees_proxy) updates
 ON employees.emp_no = updates.emp_no
WHEN MATCHED THEN
  UPDATE SET *
WHEN NOT MATCHED
  THEN INSERT *

Visualize Data

Let's move employees.salaries before moving to BI visualization

CREATE TABLE IF NOT EXISTS salaries_proxy
USING org.apache.spark.sql.jdbc
OPTIONS (
  url "jdbc:mysql://iomete-tutorial.cetmtjnompsh.eu-central-1.rds.amazonaws.com:3306/employees",
  dbtable "employees.salaries",
  user 'tutorial_user',
  password '9tVDVEKp'
);

CREATE TABLE salaries USING delta 
  AS SELECT  * FROM salaries_proxy;

Create a view joining employees and salaries tables

CREATE OR REPLACE VIEW employee_salaries AS 
 SELECT e.emp_no, e.first_name, e.last_name, e.gender, s.salary FROM employees e
  JOIN salaries s ON e.emp_no = s.emp_no;

Open BI Application

Enter a caption for this image (optional)

To add a new database connection, choose Databases link from the Data menu and click the +Database button:

Enter a caption for this image (optional)

Choose Database Type

Here you need to choose Apache Hive from the dropdown:

Enter a caption for this image (optional)

Replace iomete_username and warehouse_name with your values accordingly

hive://<iomete_username>:XXXXXXXXXX@<warehouse_name>-warehouse-thriftserver:10000/?auth=CUSTOM&transport_mode=http
Enter a caption for this image (optional)

Add new dataset

To add a new dataset, choose Dataset link from the Data menu and click the + Dataset button:

Enter a caption for this image (optional)

Create a new chart

Click on the newly created dataset employee_salaries which opens chart view. Let's create a table visualization for the Top 10 High Salary Employees

Enter a caption for this image (optional)

Save the chart to a dashboard

Enter a caption for this image (optional)

Create another chart. This time Female/Male Salary Distribution using PieChart visualization

Enter a caption for this image (optional)

Save this chart to the dashboard too and navigate to the dashboard. And, here is the dashboard of the Employees that we just created

Enter a caption for this image (optional)

Congratulations !


Bonus part

There is a dedicated python library to help to automate this table replication with just a configuration. Please, check out Syncing JDBC Sources

Subscribe to iomete

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe