Amazon Redshift

Import and update people from an Amazon Redshift database, making sure that people in your workspace reflect the latest information from your CRM or other backend systems.

Connecting your Amazon Redshift database to your workspace makes it easy to add and update people on a recurring interval from your backend systems. You can also set this integration to add people to, or update people in, a manual segment, automatically triggering campaigns on each sync interval.

Requirements

We support both SSL and non-SSL database connections. As a part of setup, you’ll need to provide the credentials of a database user with read-access to the tables you want to select data from.

If you use a firewall or an allowlist, you must allow the following IP addresses (corresponding to your account region—US or EU), so that we can connect to your database.

Account regionIP Address
US34.122.196.49
EU104.155.37.221

Query Requirements

When you create a database sync, you provide a query selecting the people and columns you want to import. Each row returned from your query is a person you’ll add or update in Customer.io; each column is an attribute that you’ll set for the people you add or update.

Your query must:

  • Select at least one column representing a person’s identifier—email and/or id, depending on your workspace settings.

    Your query can only use Select * the table you import from contains id or email in the appropriate case. If a column does not map directly to an identifier, you’ll receive an error, and you’ll need to rewrite your query to select individual columns.

  • Be limited to 40,000,000 rows and 300 columns, where each row represents a person and each column represents an attribute. Contact us if you want to import more than ten million records per sync. See optimize your query for help limiting your query.

Best Practices

Before you add this integration, you should take some measures to ensure the security of your customers’ data and limit performance impacts to your backend database. The following “best practice” suggestions can help you limit the potential for data exposure and minimize performance impacts.

  • Create a new database user. You should have a database user with minimal privileges specifically for Customer.io import/sync operations. This person only requires read permissions with access limited to the tables you want to sync from.

  • Do not use your main database instance. You may want to create a read-only database instance with replication in place, lightening the load and preventing data loss on your main instance.

  • Sync only the data that you’ll use in Customer.io. Limiting your query can improve performance, and minimizes the potential to expose sensitive data. Select only the columns you care about, and make sure you use the {{last_sync_time}} to limit your query to data that changed since the previous sync.

  • Limit your sync interval so that you don’t overload your database. You should monitor your first few syncs to ensure that you don’t impact your system’s security and performance.

  • Observe regional data regulations. Your data in Customer.io is stored in your account region—US or EU. If your database resides in Europe, but your Customer.io account is based in the US, GDPR and other data regulations may apply. Before you connect your database to Customer.io, make sure that you’re abiding by your regional data regulations.

Add a sync

If you use a firewall or an allowlist, you must allow the following IP addresses (corresponding to your account region—US or EU), so that we can connect to your database.

Account regionIP Address
US34.122.196.49
EU104.155.37.221
  1. Go to Data & Integrations > Integrations and select Amazon Redshift Data In. You can search for your database type or click Databases to find it.

  2. Click Set up sync.

  3. Enter a Name and Description for your database and click Sync settings. These fields describe your database import for other users in your workspace.

  4. Set your sync settings and click Select database.
    1. How often should this import sync? Intervals are based on the end time of the previous sync. If you set an interval of 1 hour, and a sync operation takes a minute, your syncs will occur 1 hour and 1 minute apart. You can sync your database down to the minute, but large, successive sync operations can impact your workspace’s performance.
    2. Schedule start time lets you set the date and time when you want to begin syncs.
    3. How do you want to identify people? Select whether you want to add and/or update people. If your workspace supports both email and ID as identifiers, select the value you’ll use to identify people—email or id.
    4. Sync these people to a segment?: As a part of each sync, you can add people to a new or existing segment. Use Create a new segment to set up a new segment specifically for your sync and Sync to an existing segment to add people to another segment in your workspace.
    sync settings
    sync settings

  5. Enter a database user’s credentials and click Add database. We suggest that you use someone with read-only credentials for your database.

    While we don’t write to your database, using read-only credentials ensures that you can’t inadvertently make changes to your database through your query.

    When you add your database, we’ll try the connection to make sure your settings are correct. When you’re done, click Write query to move to the next step. If you added a database as a part of another sync operation, you can select it instead of adding a new database.

  6. Enter your query and click Run query to preview up to 100 rows of results.

    Your query:

    • Should SELECT individual columns.
    • Must include columns representing id or email to identify people. If your columns aren't named id or email, you can use AS to map them to attributes in Customer.io.
    • Should include a WHERE clause, comparing recent updates against the last_sync_time (Unix epoch timestamp) to limit syncs to the most recent updates.

     Use SELECT * to see available columns

    You can use SELECT * in the Query step to preview the first 100 rows in your query and all available columns. This can help you determine which columns you actually want to select. We may show errors if you need to rename columns using AS.

    input your query
    input your query
  7. Click Review import to review your sync setup.

  8. Click Set up sync to start the import process.

Check the status of a sync

The Imports tab for your integration shows recent sync intervals. Click an interval to see how many people you imported, how long the sync operation took to complete, and other information.

Sync operations will show Failed if the query contained any failed rows. While some rows may have synced normally, we report a failure to help you find and correct individual failures. See Import failures for more information.

  1. Go to Data & Integrations > Integrations and select Amazon Redshift
  2. Click the sync you want to check the status of and go to the Imports tab.
    Click a sync to change settings and see details
    Click a sync to change settings and see details

Pause or resume a sync

Pausing a sync lets you skip sync intervals, but doesn’t otherwise change your configuration. If you resume a sync after you pause it, your sync will pick up at its next scheduled interval.

  1. Go to Data & Integrations > Integrations and select Amazon Redshift Data In.
  2. Click next to the sync you want to modify and select Pause. If your sync is paused and you want to resume it, click Activate.
    pause and resume syncs on the MySQL page
    pause and resume syncs on the MySQL page

Update a sync

When you update or change the configuration of a sync, your changes are reflected on the next sync interval.

  1. Go to Data & Integrations > Integrations and select Amazon Redshift Data In.
  2. Click the sync you want to update.
  3. Make your changes. Click between Query and Settings tabs to make changes to different aspects of your sync.
  4. Click Save Changes.

Delete a sync

Deleting a sync stops syncing/updating people from your database using a particular query. It does not delete or otherwise modify anybody you imported or updated from the database with that query.

  1. Go to Data & Integrations > Integrations and select Amazon Redshift Data In.
  2. Click next to your sync and select Delete.

Optimize your query

Because your database sync operates on an interval, you should optimize your query to ensure that we import the right information, quickly, with the least noise. When setting up your query, you should consider:

  • Your database timeout value: Queries selecting large data sets may timeout.
  • Cost: Are you charged per query or for the amount of data returned?
  • Can you narrow your query?: Add a “last_updated” or similar column to tables you import, and index that column. You’ll use this column to select the changeset for each sync.
SELECT id AS "id", email AS "email", firstn AS "first_name" , created AS "created_at"
FROM my_table
WHERE last_updated > {{last_sync_time}}

Last Sync Time

We strongly recommend that you index a column in your database representing the date-time each row was last-updated. When you write your query, you should add a WHERE clause comparing your “last updated” column to the {{last_sync_time}}.

The last sync time is a Unix timestamp representing the date-time when the previous sync started. Comparing a “last-updated” column to this timestamp helps you limit your sync operations to the columns that changed since the previous sync.

If you use ISO date-times, you can convert them to unix timestamps in your query.

If you use ISO date-times, you can convert them to unix timestamps in your query.

SELECT id, email, first_name, created AS created_at
FROM my_table
WHERE extract(epoch from last_updated) > {{last_sync_time}}

Mapping columns to attributes

We map column names in your query to attributes in your workspace, exactly as formatted in your query. However, queries are not case sensitive: if a column in your database is called Email, you can use AS "email" to map the column to the email attribute in your workspace.

Attributes in Customer.io are generally lowercased. We recommend that you rename columns with uppercased characters accordingly.

SELECT id, email, primary_phone AS phone
FROM my_table
WHERE extract(epoch from last_updated) > {{last_sync_time}}

Sync intervals

You can set your workspace to import from your database on a basis of minutes, hours, days, etc. A sync interval begins when the previous sync ends. So, if you set your sync interval for 1 hour, and a sync takes 1 minute, syncs will occur every 61 minutes.

We tested sync performance for a MySQL server against an empty workspace with no concurrent operations (API calls, running campaigns, etc) with the following results. Your results may vary if your query is more complex, or your workspace has multiple concurrent, active users during the sync.

Adjust your sync intervals to provide significant buffer between syncs and account for concurrent users in your workspace or other operations (active campaigns, segmentation, or other operations that affect your audience).

Database rowsDatabase columnsAverage sync time (mm:ss)
100,000104:20
250,0001010:36
500,0001021:49
750,0001031:22
1,000,0001040:39

Import failures

Rows that fail to add or update a person report errors. You can find a count of errors with any sync and download a list of errors for failed rows by going to Data & Integrations > Integrations > Amazon Redshift Data In.

If a sync interval contained any failed rows, the operation shows Failed. Rows may still have been imported, but we report a failure so that it’s clear that the sync interval contained at least one failure. Click the row for more information. Click Download to get a CSV file containing errors for each failed row.

 If you see Failed Attribute Changes, try changing your workspace settings

Syncs that change a person’s email address can be a frequent source of Failed Attribute Change errors. You can enable the Allow updates to email using ID setting under Settings > Workspace Settings > General Workspace Settings to make it easier to change people’s email values after they are set and avoid Failed Attribute Change errors.

Show failures for a sync interval
Show failures for a sync interval

In general, most issues are of the Failed Attribute Change type relating to changes to id or email identifiersThe attributes you use to add, modify, and target people. Each unique identifier value represents an individual person in your workspace.. You are likely to see this error if:

You set an id or email value that belongs to another person.

If your workspace identifies people by either email or id, these values must be unique. Attempting to set a value belonging to another person will cause an error.

You attempt to change an id or email value that is already set for a person.

You can set an id or email if it is blank; you cannot change these values after they are set in a Sync. You can only change these values from the People page, or when you identify people by cio_idAn identifier for a person that is automatically generated by Customer.io and cannot be changed. This identifier provides a complete, unbroken record of a person across changes to their other identifiers (id, email, etc).), which you cannot use in a Sync.

You set an invalid email value

Emails must conform to the RFC 5322 standard. If they do not, you’ll receive an attribute change failure.

FAQ

What other databases do you support for import operations?

In addition to Amazon Redshift, we also support MySQL, Postgres, Microsoft SQL, Google BigQuery, and Snowflake. Contact us if you want to sync with a different database or data warehouse.

Do you support SSL or TLS connections?

We support SSL connections. You can also secure your connection by limiting access to approved IP addresses.

Is there a limit to the number of people I can sync at a time?

You cannot add or update more than 10,000,000 people (rows) at a time. Consider adding a LIMIT and ORDER BY to your query, or using a WHERE clause to limit updates to people who have been added or updated since the {{last_sync_time}}. See optimize your query for more information.

Your query cannot SELECT more than 300 columns, where each column represents an attribute.

Contact us if you want to import more rows or columns.

Copied to clipboard!
Is this page helpful?