Graphlab & ODBC

For those out there working with Dato(Graphlab) and trying to setup an ODBC connection to just pull all the data straight into the SFrame, here are some tips I’ve learned from troubleshooting.

What is ODBC?
Open Database Connectivity which is a middleware API to help standardize and simplify access to database management systems.

Connection Pointers:
There are a number of links on odbc setup but it was a little tricky to get it to work with Graphlab, Linux and OSX and Graphlab’s documentation is a little sparse in that area right now.

This is one of the links I found that was helpful for setting up on a Linux machine. The following are the steps I used

  • wget #]/redhat/rhel-[version #]/pgdg-centos-[OS type & #].noarch.rpm
  • Use the package version link from in the wget command above to pull the rpm file that you need. Note, you are setting up the postgres yum server on your computer to run yum install postgres odbc packages after the fact
  • rpm -ivh ./pgdg-[OS type & #].noarch.rpm
  • yum install postgresql[version #]-odbc.[version #]
  • yum install postgresql[version #]-odbc-debuginfo.[verions #]
  • yum install unixODBCl

In the yum install portion, you can combine and separate with spaces each package on one line. You may need to sudo install depending on the role you are logged into the system as and the available permissions. Best practice is to avoid using sudo.

Now that you have the packages installed, update the odbcinist.ini file which should be in /etc/ directory. Sample file contents include:

Description = ODBC for PostgreSQL
Driver = /usr/pgsql-[version #]/lib/
Setup = /usr/lib64/
Driver64 = /usr/pgsql-[version #]/lib/
Setup64 = /usr/lib64/
Database = [database name]
Server = [address for server which if redshift it will look like: ?……]
Port = [port for your setup something like 5432 or 5439]
FileUsage = 1

Settings above can vary. Definitely read up on options and how it relates to your connection setup.

This was a little trickier because the documentation wasn’t as clear. I ended up using homebrew package manager and the following steps worked.

  • brew update
  • brew install unixodbc
  • brew install psqlodbc

Next setup odbc.ini which should be under the /usr/local/Cellar/unixodbc/[version #]/etc/ directory. Sample file contents include:

Description = ODBC for PostgreSQL
Driver = PostgreSQL
Database = [database name]
Server = [address for server which if redshift it will look like: ?……]
Port = [port for your setup]
Protocol = [protocol for your setup]
Debug = 1

Then setup odbcinst.ini which should also be under the /usr/local/Cellar/unixodbc/[version #]/etc/ directory. Sample file contents include:

Description = PostgreSQL ODBC driver
Driver = /usr/local/Cellar/psqlodbc/[version #]/lib/
Setup = /usr/local/Cellar/unixodbc/[version #]/lib/libodbc.2.dylib
Debug = 0
CommLog = 1
UsageCount = 1

The sticky part for getting graphlab odbc connect to work was that I needed path variables to point to the odbc config files. Thankfully I got this idea from this Stackoverflow post. So in the .bash_profile (which should be in your home directory – use ~/ to get there) add the following:

export ODBCINI=/usr/local/Cellar/unixodbc/[version #]/etc/odbc.ini
export ODBCSYSINI=/usr/local/Cellar/unixodbc/[version #]/etc/

Same with Linux, the setup will vary based on your configuring needs. If at first you don’t succeed, keep researching on how to adjust.

At this point you can go into a python or Ipython kernal and try:

  • import graphlab
  • graphlab.connect_odbc(“Driver=PostgreSQL;Server=[server address like above];Database=[database name];UID=[username];PWD=[password]”)

For some reason even though the parameters in the connection string are defined in the odbcinst.ini config files, Graphlab complains that the string is missing data without them. Specifically, you need to include Driver, Server, Database, UID and PWD. Its good security to pass in your password at least as a variable that comes form a config file and/or the environment.

Once the odbc connection worked, it made the data product run so much more effectively. I’m able to pull the data directly into the package that will build the model and stripped out an extra step that previously existed to query the data into a middle storage before loading it to the package that would train the model. There are other tools out there like that coming into wider use to cut to the chase regarding data processing and machine learning. Spark is one such tool that I’m especially interested in and will try to write about in the future.

4 thoughts on “Graphlab & ODBC

  1. Evan


    I worked on GraphLab Create’s ODBC connector, so I couldn’t help but notice your post. Thanks for putting this out there, and I’m glad you finally got it working! Admittedly ODBC has a bit of a learning curve, but it’s quite useful once you jump its hurdles. Just a few notes:

    1. I’m surprised you had to set ODBCINI and ODBCSYSINI. Did you have to do this on both Linux and OSX?

    2. Generally, settings like Database, Server, and Port are put in odbc.ini, and not odbcinst.ini. This could explain why connect_odbc required you to re-specify these settings. odbc.ini is where DSN’s live, so when using them, you wouldn’t specify “Driver=…” in the connection string, and you would specify “DSN=…”. We don’t do anything special here with the connection string, we just pass it to unixODBC.

    3. You mentioned our documentation on this is a little sparse. I’m just wondering if you found this:
    Either way, if you found this page and didn’t find it useful enough, or didn’t find it, we’d love to know.

  2. nyghtowl Post author

    Thanks for the response and the additional insights. When I was looking for details on Graphlab’s site last year (when I was initially setting this up), I did find that page, but it appears to be updated with more details which is great. ODBC is very tricky, and I only needed to setup odbcinist.ini on Linux to get it to work. The directions that I found to be most helpful with Mac pointed to setting up both files and since it worked, I went with it. Good to know about the odbc.ini is only needed. If I get time I’ll try to double check that in my configuration.

  3. chaslemley

    Thank you for this post. I ran into the same issue when trying to connect to redshift. I found this helpful post:

    I believe you need to change “Server” to “Servername” in your odbc.ini file.

    Here are the steps I took on OSX:

    1. install unixodbc and psqlodbc using homebrew.
    2. Use `odbcinst -j` to determine the DRIVERS file (for me it is /usr/local/Cellar/unixodbc/2.3.2_1/etc/odbcinst.ini). Edit that file to look like this:
    3. Use `odbcinst -j` to determine the USER DATA SOURCES file (for me it is /Users/myuser/.odbc.ini). Edit that file to look like: Notice “Servername”. For other dbs (mysql, etc) I believe Server works, but Postgres requires Servername.
    4. Use isql to test the setup. `isql redshift db_user db_password` Where redshift is the name defined in .odbc.ini above.
    5. Connect from graphlab.
    import graphlab
    db = graphlab.connect_odbc(“DSN=redshift;UID=db_user;PASSWORD=db_password”)
    6. Profit!

    This is a great way to use graphlab. I was previously using the psql command line tool to export csv files.



Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s