Redshift’s UDFs (User Defined Functions) permit to execute, with some limitations, certain Python libraries and custom code.

In my case, I wanted to find a way to extract the country code from a phone number in E.164 format. UDFs are a perfect fit for this, the implementation in SQL would most certainly require creating custom views, and hacking your way around, while in python, we can use the library phone-iso3166

>>> from phone_iso3166.country import *
>>> phone_country('+1 202-111-1111')
'US'
>>> phone_country('+34645678901')
'ES'

To upload a library to Redshift, we first need to check it follows the structure:

directory/
    __init__.py
    extra_files.py
    subdirectory/
        __init__.py
        other_files.py

In our case, phone-iso3166 is already in that structure. Now we need to zip the library:

tar -xvzf phone-iso3166.tar.gz
zip -r phone-iso3166.zip phone-iso3166

With our zipped library, we need to upload it to S3. I did this part manually into a bucket named s3://redshift/custom-udfs/

Now, connect to Redshift and issue:

CREATE LIBRARY phone_iso3166 LANGUAGE plpythonu 
FROM 's3://redshift/custom-udfs/phone_iso3166.zip' 
CREDENTIALS 'aws_access_key_id=<your-aws-key>;aws_secret_access_key=<your-aws-pass>' 
region as '<your-region>';

Last step:

CREATE OR REPLACE FUNCTION udf_phone_country (phone_number VARCHAR(64)) RETURNS VARCHAR(64) IMMUTABLE as $$ 
from phone_iso3166.country import phone_country, InvalidPhone
try:
    return phone_country(phone_number)
except:
    return None
$$ LANGUAGE plpythonu;

You should be all set to use this in your queries:

SELECT 
analytics.udf_phone_country('+14151111975'),
analytics.udf_phone_country('+7652112311'),
analytics.udf_phone_country('+34626472918')

That returns:

"US","RU","ES"

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.