Snowflake import and sync

Import and update people from a Snowflake database, making sure that people in your workspace reflect the latest information from your CRM or other backend system.

Connecting your Snowflake database to your workspace makes it easy to add and update people on a recurring interval from your backend systems. You can also set this integration to add people to, or update people in, a manual segment, automatically triggering campaigns on each sync interval.

Requirements

As a part of setup, you’ll need to provide the credentials of a database user with read-access to the tables you want to select data from.

If you use a firewall or an allowlist, you must allow the following IP addresses (corresponding to your account region—US or EU), so that we can connect to your database.

Account region IP Address
US 34.122.196.49
EU 104.155.37.221

Query Requirements

When you create a database sync, you provide a query selecting the people and columns you want to import. Each row returned from your query is a person you’ll add or update in Customer.io; each column is an attribute that you’ll set for the people you add or update.

Your query must:

  • SELECT individual columns representing your audience’s id and/or email. Because Snowflake column names are all caps, you must use AS to map these columns to the appropriate lowercase attributesA key-value pair that you associate with a person—like their name, the date they were created in your workspace, etc. Use attributes to target people and personalize messages. in Customer.io.
    SELECT ID as "id", EMAIL as "email" from MY_TABLE
    
  • Be limited to 10,000,000 rows and 50 columns, where each row represents a person and each column represents an attribute. Contact us if you want to import more than ten million records per sync. See optimize your query for help limiting your query.

Best Practices

Before you add this integration, you should take some measures to ensure the security of your customers' data and limit performance impacts to your backend database. The following “best practice” suggestions can help you limit the potential for data exposure and minimize performance impacts.

  • Create a new database user. You should have a database user with minimal privileges specifically for Customer.io import/sync operations. This person only requires read permissions with access limited to the tables you want to sync from.

  • Do not use your main database instance. You may want to create a read-only database instance with replication in place, lightening the load and preventing data loss on your main instance.

  • Sync only the data that you’ll use in Customer.io. Limiting your query can improve performance, and minimizes the potential to expose sensitive data. Select only the columns you care about, and make sure you use the {{last_sync_time}} to limit your query to data that changed since the previous sync.

  • Limit your sync interval so that you don’t overload your database. You should monitor your first few syncs to ensure that you don’t impact your system’s security and performance.

  • Observe regional data regulations. Your data in Customer.io is stored in your account region—US or EU. If your database resides in Europe, but your Customer.io account is based in the US, GDPR and other data regulations may apply. Before you connect your database to Customer.io, make sure that you’re abiding by your regional data regulations.

Add a sync

If you use a firewall or an allowlist, you must allow the following IP addresses (corresponding to your account region—US or EU), so that we can connect to your database.

Account region IP Address
US 34.122.196.49
EU 104.155.37.221
  1. Go to Data & Integrations > Integrations and select Snowflake Data In. You can search for your database type or click Databases to find it.

  2. Click Create Sync

  3. Enter a Name and Description for your database and click Sync settings. These fields describe your database import for other users in your workspace.

  4. Set your sync settings and click Select database.
    1. How often should this import sync? You can sync your database down to the minute, but large, successive sync operations impact your workspace’s performance.
    2. How do you want to identify people? Select whether you want to add and/or update people. If your workspace supports both email and ID as identifiers, select the value you’ll use to identify people—email or id.
    3. Sync these people to a segment?: As a part of each sync, you can add people to a new or existing segment. Use Create a new segment to set up a new segment specifically for your sync and Sync to an existing segment to add people to another segment in your workspace.
    sync settings
    sync settings

  5. Enter a database user’s credentials and click Add database. We suggest that you use someone with read-only credentials for your database.

    While we don’t write to your database, using read-only credentials ensures that you can’t inadvertently make changes to your database through your query.

    When you add your database, we’ll try the connection to make sure your settings are correct. When you’re done, click Write query to move to the next step. If you added a database as a part of another sync operation, you can select it instead of adding a new database.

  6. Enter your query and click Run query to preview up to 100 rows of results.

    Your query:

    • Must SELECT individual columns.
    • Must include columns representing id or email to identify people. You must use AS to rename at least your ID or EMAIL columns to match attributes in Customer.io.
    • Should include a WHERE clause, comparing recent updates against the last_sync_time (Unix epoch timestamp) to limit syncs to the most recent updates.

     Use SELECT * to see available columns

    You can use SELECT * in the Query step to preview the first 100 rows in your query and all available columns. This can help you determine which columns you actually want to select. We may show errors if you need to rename columns using AS.
    input your query
    input your query
  7. Click Review import to review your sync setup.

  8. Click Create sync to start the import process.

Check the status of a sync

The Imports tab for your integration shows recent sync intervals. Click an interval to see how many people you imported, how long the sync operation took to complete, and other information.

Sync operations will show Failed if the query contained any failed rows. While some rows may have synced normally, we report a failure to help you find and correct individual failures. See Import failures for more information.

  1. Go to Data & Integrations > Integrations and select Snowflake Data In
  2. Click the sync you want to check the status of and go to the Imports tab.
    Click a sync to change settings and see details
    Click a sync to change settings and see details

Pause or resume a sync

Pausing a sync lets you skip sync intervals, but doesn’t otherwise change your configuration. If you resume a sync after you pause it, your sync will pick up at its next scheduled interval.

  1. Go to Data & Integrations > Integrations and select Snowflake Data In.
  2. Click next to the sync you want to modify and select Pause. If your sync is paused and you want to resume it, click Activate.
    pause and resume syncs on the MySQL page
    pause and resume syncs on the MySQL page

Update a sync

When you update or change the configuration of a sync, your changes are reflected on the next sync interval.

  1. Go to Data & Integrations > Integrations and select Snowflake Data In.
  2. Click the sync you want to update.
  3. Make your changes. Click between Query and Settings tabs to make changes to different aspects of your sync.
  4. Click Save Changes.

Delete a sync

Deleting a sync stops syncing/updating people from your database using a particular query. It does not delete or otherwise modify anybody you imported or updated from the database with that query.

  1. Go to Data & Integrations > Integrations and select Snowflake Data In.
  2. Click next to your sync and select Delete.

Optimize your query

Because your database sync operates on an interval, you should optimize your query to ensure that we import the right information, quickly, with the least noise. When setting up your query, you should consider:

  • Your database timeout value: Queries selecting large data sets may timeout.
  • Cost: Are you charged per query or for the amount of data returned?
  • Can you narrow your query?: Add a “last_updated” or similar column to tables you import, and index that column. You’ll use this column to select the changeset for each sync.
SELECT id AS "id", email AS "email", firstn AS "first_name" , created AS "created_at"
FROM my_table
WHERE last_updated > {{last_sync_time}}

Last Sync Time

We strongly recommend that you index a column in your database representing the date-time each row was last-updated. When you write your query, you should add a WHERE clause comparing your “last updated” column to the {{last_sync_time}}.

The last sync time is a Unix timestamp representing the date-time when the previous sync started. Comparing a “last-updated” column to this timestamp helps you limit your sync operations to the columns that changed since the previous sync.

If you use ISO date-times, you can convert them to unix timestamps in your query.

SELECT ID as "id", EMAIL as "email", FIRSTN as "first_name", CREATED AS "created_at"
FROM my_table
WHERE UNIX_TIMESTAMP(last_updated) > {{last_sync_time}}

Mapping columns to attributes

We map column names in your query to attributes in your workspace, exactly as formatted in your query. However, columns in Snowflake are all caps and queries are not case sensitive. You must use AS to map at least your ID and EMAIL columns to people-identifiers in Customer.io.

Attributes in Customer.io are generally lowercased. You should make sure all of your columns map to attributes that already exist in your workspace so that you don’t create duplicate attributes with mismatched cases (i.e. FIRST_NAME and first_name).

SELECT ID AS "id", EMAIL AS "email", PRIMARY_PHONE AS "phone"
FROM people
WHERE updated > {{last_sync_time}}

Sync intervals

You can set your workspace to import from your database on a basis of minutes, hours, days, etc. Be mindful of the time frame that you set, in accordance with the general number of people you expect to import with each sync, and how often you truly need to update people in your workspace.

If a sync is still running when the next sync interval is scheduled to begin, we’ll cancel the next sync interval. If this continues to happen, you should change your interval so that sync operations don’t overlap.

We tested sync performance for a MySQL server against an empty workspace with no concurrent operations (API calls, running campaigns, etc) with the following results. Your results may vary if your query is more complex, or your workspace has multiple concurrent, active users during the sync.

You should adjust sync intervals so that you do not sync more often than the average sync time below, and we strongly recommend that you significantly buffer sync intervals to account for concurrent users in your workspace and additional operations (active campaigns, segmentation, and other operations that affect your audience).

Database rows Database columns Average sync time (mm:ss)
100,000 10 4:20
250,000 10 10:36
500,000 10 21:49
750,000 10 31:22
1,000,000 10 40:39

Import failures

Rows that fail to add or update a person report errors. You can find a count of errors with any sync and download a list of errors for failed rows by going to Data & Integrations > Integrations > Snowflake Data In.

If a sync interval contained any failed rows, the operation shows Failed. Rows may still have been imported, but we report a failure so that it’s clear that the sync interval contained at least one failure. Click the row for more information. Click Download to get a CSV file containing errors for each failed row.

Show failures for a sync interval
Show failures for a sync interval

In general, most issues are of the Failed Attribute Change type relating to changes to id or email identifiersThe attributes you use to add, modify, and target people. Each unique identifier value represents an individual person in your workspace.. You are likely to see this error if:

You set an id or email value that belongs to another person. If your workspace identifies people by either email or id, these values must be unique. Attempting to set a value belonging to another person will cause an error.
You attempt to change an id or email value that is already set for a person. You can set an id or email if it is blank; you cannot change these values after they are set in a Sync. You can only change these values from the People page, or when you identify people by cio_idAn identifier for a person that is automatically generated by Customer.io and cannot be changed. This identifier provides a complete, unbroken record of a person across changes to their other identifiers (id, email, etc).), which you cannot use in a Sync.
You set an invalid email value Emails must conform to the RFC 5322 standard. If they do not, you’ll receive an attribute change failure.

FAQ

What databases do you support for import operations? We support MySQL, Postgres (including Amazon Redshift), and Snowflake databases. Contact us if you want to sync with a different database or data warehouse, like BigQuery. To sync from a Redshift instance, use our Postgres integration.
Do you support SSL or TLS connections? We support SSL connections. You can also secure your connection by limiting access to approved IP addresses.
Is there a limit to the number of people I can sync at a time?

You cannot add or update more than 10,000,000 people (rows) at a time. Consider adding a LIMIT and ORDER BY to your query, or using a WHERE clause to limit updates to people who have been added or updated since the {{last_sync_time}}. See optimize your query for more information.

Your query cannot SELECT more than 50 columns, where each column represents an attribute.

Contact us if you want to import more rows or columns.

Copied to clipboard!