Amazon Redshift Reverse ETL Source

Import data from Amazon Redshift to any destination. This source saves you the trouble of writing code to extract, transform, and load data from your warehouse. Instead, specify the query (or queries) to run, and we’ll handle the rest.

Best Practices

Before you add this integration, you should take some measures to ensure the security of your customers’ data and limit performance impacts to your database. The following “best practice” suggestions can help you limit the potential for data exposure and minimize performance impacts.

  • Create a new database user/service account. You should have a database user with minimal privileges specifically for Customer.io import/sync operations. This person only requires read permissions with access limited to the tables you want to sync from.

  • Avoid using your main database instance. You may want to create a read-only database instance with replication in place, lightening the load and preventing data loss on your main instance.

  • Sync only the data that you’ll use in Customer.io. Limiting your query can improve performance, and minimizes the potential to expose sensitive data. Select only the columns you care about, and make sure you use the {{last_sync_time}} to limit your query to data that changed since the previous sync.

  • Limit your sync frequency so that syncs don’t overlap and you don’t cause extra database traffic. You should monitor your first few syncs to ensure that you don’t impact your system’s security and performance.

Set up a Service Account

When you set up an Amazon Redshift source, you should create a user account that Customer.io will use to connect to Redshift. This user has limited access to your Redshift instance and gives you granular control over access to your Redshift instance.

  1. Log in to Redshift and select the Redshift cluster you want to use as a Data Pipelines source.

  2. Run the commands below to create a user named customerio.

     -- create a user named "customerio"
     CREATE USER customerio PASSWORD '<enter password here>';
    
     -- allows the "customerio" user to create new schemas on the specified database. 
     -- (This is the name you chose when provisioning your cluster)
     GRANT CREATE ON DATABASE "<enter database name here>" TO "customerio";
    

Add your Amazon Redshift source

As a part of this setup, you’ll provide Customer.io with user credentials that we’ll use to query your database. We recommend that you create a new user with Read Only access specifically for Customer.io, so you can manage Customer.io access to your database independent of any other Microsoft SQL users you have.

Make sure your database allows connections from 34.29.50.4. This is the IP address of our reverse ETL service. If this address is blocked by a firewall or blocklist, we won’t be able to connect to your database.

  1. In the Data Pipelines tab, click Sources.

  2. Click Add Source and select Amazon Redshift.

  3. Provide your database information, including credentials to connect to your database, and click Save database.

    • The Name is a friendly name you’ll use to recognize your database whenever you reference it in Customer.io.
    • Enter Host address and the name of the database you want to connect to.
    • Enter a database user’s credentials and click Add database. We suggest that you create a service account and use service account credentials to set up your database.
      connect your database
      connect your database
  4. Set up a Sync. A sync is the type of source data (identify, track, etc) you want to import from your database and click Next: Define Query. You can set up syncs for each type of data you want to import for your source.

    1. Provide a Name and Description for the sync. This helps you understand the sync at a glance when you look at your source Overview later.
    2. Select the type of data you want to import.
    3. Set the Sync Frequency, indicating how often you want to query your database for new data. You should set the frequency such that sync operations don’t overlap. Learn more about sync frequency.
    4. Select when you want to start the sync: whether you want to begin importing data immediately, or schedule the sync to start at a later date.
      add a sync and determine the data you want to capture
      add a sync and determine the data you want to capture

  5. Enter the query that selects the data you want to import. See Queries below for more information about the information you’ll want to select for your sync. Click Run Query to preview results and make sure that your query selects the right information.

    set up your query to sync the right information
    set up your query to sync the right information

  6. Click Enable to enable your sync.

Now you can set up additional syncs and connect your source to one or more destinations.

Adding syncs

After you set up your source, you can add additional syncs to import different types of data from your database. For example, you might want to import identify data for your users, and track data for their actions. Subsequent syncs can rely on your existing database, or you can add another database within your source.

  1. In your source, go to the Syncs tab and click Add Sync.
  2. Select your database or add a new one and click Next: Create Sync.
  3. Set up a syncA sync is the type of source data (identify, track, etc) you want to import from your database. A sync is essentially the type of source call you want to make. and click Next: Define Query. You can set up syncs for each type of data you want to import.
    1. Provide a Name and Description for the sync. This helps you understand the sync at a glance when you look at your source Overview later.
    2. Select the type of data you want to import.
    3. Set the Sync Frequency, indicating how often you want to query your database for new data. You should set the frequency such that sync operations don’t overlap. Learn more about sync frequency.
    4. Select when you want to start the sync: whether you want to begin importing data immediately, or schedule the sync to start at a later date.
      add a sync and determine the data you want to capture
      add a sync and determine the data you want to capture
  4. Enter the query that selects the data you want to import. See Queries below for more information about the information you’ll want to select for your sync. Click Run Query to preview results and make sure that your query selects the right information.
    set up your query to sync the right information
    set up your query to sync the right information
  5. Click Enable to enable your sync.

Sync Frequency

You can sync data as often as every minute. However, we recommend that you set your sync frequency such that sync operations don’t overlap. If you schedule syncs such that a sync operation is scheduled to start while the previous operation is still we’ll skip the next sync operation.

Queries for each sync type

When you create a database sync, you provide a query selecting the people or objects you want to import, and respective attributes. You can query your storage bucket using the same methods in our Data Pipelines API.

When you create a database sync, your query has to account for required fields in a source call. Each query is based around our Data Pipelines APIs. Each row returned from your query represents an individual source operation/API call. Columns represent the traits or properties that you want to apply to the person, group, or event that your sync imports.

See the individual queries below for required fields and examples. Your query cannot return more than 40,000,000 rows and 300 columns.

Identify

The identify method tells Data Pipelines who the current website visitor is, and lets you assign unique traitsA key-value pair that you associate with a person or an object—like a person’s name, the date they were created in your workspace, or a company’s billing date etc. Use attributes to target people and personalize messages. to a person.

You should call identify when a user creates an account, logs in, etc. You can also call it again whenever a person’s traits change. We’ve shown a typical call with a traits object, but we’ve listed all the fields available in an identify call below.

You can send an identify call with an anonymousId and/or userId.

  • anonymousId only: This assign traits to a person before you know who they are.
  • userId only: Identifies a user and sets traits.
  • both userId and anonymousId: Associates the data sent in previous anonymous page, track, and identify calls with the person you identify by userId.
SELECT id AS userId, email_address AS email, fname, lname, msisdn AS phone
FROM users
WHERE last_updated >= {{last_sync_time}}

In most cases, you’ll simply provide userId, anonymousId, and traits, but we’ve provided the full list of writable-fields below. You’ll find the complete source payload in our API documentation.

  • anonymousId string
    A unique substitute for a User ID in cases when you don’t have an absolutely unique identifier. Our libraries generate this value automatically to help you track people before they sign up, log in, provide their email, etc.
    • active boolean

      Whether a user is active.

      This is usually used to flag an .identify() call to just update the traits but not “last seen.”

    • channel string
      The channel the event originated from.

      Accepted values:browser,server,mobile

    • ip string
      The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
    • locale string
      The local string for the current user, e.g. en-US.
    • userAgent string
      The user agent of the device making the request
      • content string
      • medium string
        The type of traffic a person/event originates from, like email, or referral.
      • name string
        The campaign name.
      • source string
        The source of traffic—like the name of your email list, Facebook, Google, etc.
      • term string
        The keyword term(s) a user came from.
      • Additional UTM Parameters* string
      • keywords array of [ strings ]
        A list/array of keywords describing the page’s content. The keywords are likely the same as, or similar to, the keywords you would find in an HTML meta tag for SEO purposes. This property is mainly used by content publishers that rely heavily on pageview tracking. This isn’t automatically collected.
      • name string
        The name of the page. Reserved for future use.
      • path string
        The path portion of the page’s URL. Equivalent to the canonical path which defaults to location.pathname from the DOM API.
      • referrer string
        The previous page’s full URL. Equivalent to document.referrer from the DOM API.
      • search string
        The query string portion of the page’s URL. Equivalent to location.search from the DOM API.
      • title string
        The page’s title. Equivalent to document.title from the DOM API.
      • url string
        A page’s full URL. Segment first looks for the canonical URL. If the canonical URL is not provided, Segment uses location.href from the DOM API.
    • Enabled/Disabled integrations* boolean
  • messageId string
    A unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
  • receivedAt string  (date-time)
    The ISO-8601 timestamp when Data Pipelines receives an event.
  • sentAt string  (date-time)
    The ISO-8601 timestamp when a library sends an event to Data Pipelines.
  • timestamp string  (date-time)
    The ISO-8601 timestamp when the event originally took place. This is mostly useful when you backfill data past events. If you’re not backfilling data, you can leave this field empty and we’ll use the current time or server time.
    • createdAt string  (date-time)
      We recommend that you pass date-time values as ISO 8601 date-time strings. We convert this value to fit destinations where appropriate.
    • email string
      A person’s email address. In some cases, you can pass an empty userId and we’ll use this value to identify a person.
    • Additional Traits* any
      Traits that you want to set on a person. These can take any JSON shape.
  • type string
    The event type. This is set automatically by the request method/endpoint.

    Accepted values:identify

  • userId string
    The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.
  • version number
    The version of the API that received the event, automatically set by Customer.io.

Track

The track method tells us about actions people take—the events people perform—on your site. Every track call represents an event. When you send an event with a track call, you can provide associated properties about the event. For more information, see the track method reference.

You should track events that are indicators on your site, or the kinds of things that you want to know that your audience does on your website, like Video Viewed, Item Purchased or Article Bookmarked. In some products—like Customer.io Journeys—you can use events to trigger messages and other downstream actions.

You can send events with an anonymousId or a userId. Calls that you make with an anonymousId are associated with a userId when you identify someone by their userId.

Track calls typically contain an event name—so you know what a user did—and a series of properties, which contain additional information about the event. The examples below show added_to_cart events, containing properties that let you know about the product a person added to their shopping cart.

SELECT id AS userId, event_name AS name, products, total_price AS value
FROM events
WHERE timestamp > {{last_sync_time}}

In most cases, you’ll simply provide an ID for the user, the event, and properties, but we’ve provided the full list of writable-fields below. You’ll find the complete source payload in our API documentation.

    • active boolean

      Whether a user is active.

      This is usually used to flag an .identify() call to just update the traits but not “last seen.”

    • channel string
      The channel the event originated from.

      Accepted values:browser,server,mobile

    • ip string
      The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
    • locale string
      The local string for the current user, e.g. en-US.
    • userAgent string
      The user agent of the device making the request
      • content string
      • medium string
        The type of traffic a person/event originates from, like email, or referral.
      • name string
        The campaign name.
      • source string
        The source of traffic—like the name of your email list, Facebook, Google, etc.
      • term string
        The keyword term(s) a user came from.
      • Additional UTM Parameters* string
      • keywords array of [ strings ]
        A list/array of keywords describing the page’s content. The keywords are likely the same as, or similar to, the keywords you would find in an HTML meta tag for SEO purposes. This property is mainly used by content publishers that rely heavily on pageview tracking. This isn’t automatically collected.
      • name string
        The name of the page. Reserved for future use.
      • path string
        The path portion of the page’s URL. Equivalent to the canonical path which defaults to location.pathname from the DOM API.
      • referrer string
        The previous page’s full URL. Equivalent to document.referrer from the DOM API.
      • search string
        The query string portion of the page’s URL. Equivalent to location.search from the DOM API.
      • title string
        The page’s title. Equivalent to document.title from the DOM API.
      • url string
        A page’s full URL. Segment first looks for the canonical URL. If the canonical URL is not provided, Segment uses location.href from the DOM API.
  • event string
    Required The name of the event
    • Enabled/Disabled integrations* boolean
  • messageId string
    A unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
    • Event Properties* any
      Additional properties that you want to capture in the event. These can take any JSON shape.
  • receivedAt string  (date-time)
    The ISO-8601 timestamp when Data Pipelines receives an event.
  • sentAt string  (date-time)
    The ISO-8601 timestamp when a library sends an event to Data Pipelines.
  • timestamp string  (date-time)
    The ISO-8601 timestamp when the event originally took place. This is mostly useful when you backfill data past events. If you’re not backfilling data, you can leave this field empty and we’ll use the current time or server time.
  • type string
    Required The event type. This is set automatically by the request method/endpoint.

    Accepted values:track

  • userId string
    Required The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.
  • version number
    The version of the API that received the event, automatically set by Customer.io.

Group

The Group method associates an identified person with a group—like a company, organization, project, online class or any other collective noun you come up with for the same concept. In Customer.io Journeys, we call groups objectsNot to be confused with a JSON object, an object in Customer.io is a non-person entity that you can associate with one or more people—like a company, account, or online course. You can use objects to message people based on changes to their company, account, or colege course itinerary..

Group calls are useful for destinations where you maintain relationships between people and larger organizations, like in Customer.io! In Customer.io Journeys, you can store groups as objectsNot to be confused with a JSON object, an object in Customer.io is a non-person entity that you can associate with one or more people—like a company, account, or online course. You can use objects to message people based on changes to their company, account, or colege course itinerary., and trigger campaigns based on a person’s relationship to an object—like an account, online class, and so on.

Find more details about group, including the group payload, in our API spec.

Remember, group calls represent both an organization/group and relationships with users (by userId). Your query should include not only the groupId, but the userId so that you can capture relationships at your destinations.

SELECT companyId AS groupId, companyname, employees, personId AS userId
FROM companies
WHERE last_updated >= {{last_sync_time}}

In most cases, you’ll simply provide an ID for the user, the groupId, and traits, but we’ve provided the full list of writable-fields below. You’ll find the complete source payload in our API documentation.

    • active boolean

      Whether a user is active.

      This is usually used to flag an .identify() call to just update the traits but not “last seen.”

    • channel string
      The channel the event originated from.

      Accepted values:browser,server,mobile

    • ip string
      The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
    • locale string
      The local string for the current user, e.g. en-US.
    • userAgent string
      The user agent of the device making the request
      • content string
      • medium string
        The type of traffic a person/event originates from, like email, or referral.
      • name string
        The campaign name.
      • source string
        The source of traffic—like the name of your email list, Facebook, Google, etc.
      • term string
        The keyword term(s) a user came from.
      • Additional UTM Parameters* string
      • keywords array of [ strings ]
        A list/array of keywords describing the page’s content. The keywords are likely the same as, or similar to, the keywords you would find in an HTML meta tag for SEO purposes. This property is mainly used by content publishers that rely heavily on pageview tracking. This isn’t automatically collected.
      • name string
        The name of the page. Reserved for future use.
      • path string
        The path portion of the page’s URL. Equivalent to the canonical path which defaults to location.pathname from the DOM API.
      • referrer string
        The previous page’s full URL. Equivalent to document.referrer from the DOM API.
      • search string
        The query string portion of the page’s URL. Equivalent to location.search from the DOM API.
      • title string
        The page’s title. Equivalent to document.title from the DOM API.
      • url string
        A page’s full URL. Segment first looks for the canonical URL. If the canonical URL is not provided, Segment uses location.href from the DOM API.
  • groupId string
    Required ID of the group
    • Enabled/Disabled integrations* boolean
  • messageId string
    A unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
  • receivedAt string  (date-time)
    The ISO-8601 timestamp when Data Pipelines receives an event.
  • sentAt string  (date-time)
    The ISO-8601 timestamp when a library sends an event to Data Pipelines.
  • timestamp string  (date-time)
    The ISO-8601 timestamp when the event originally took place. This is mostly useful when you backfill data past events. If you’re not backfilling data, you can leave this field empty and we’ll use the current time or server time.
    • Group Traits* any
  • type string
    Required The event type. This is set automatically by the request method/endpoint.

    Accepted values:group

  • userId string
    Required The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.
  • version number
    The version of the API that received the event, automatically set by Customer.io.

Page

The Page method records page views on your website, along with optional extra information about the page a person visited.

SELECT id AS userId, metatitle as name, url, time_on_page
FROM pages
WHERE timestamp > {{last_sync_time}}

In most cases, you’ll simply provide an ID for the user and the page name, but we’ve provided the full list of writable-fields below. You’ll find the complete source payload in our API documentation.

    • active boolean

      Whether a user is active.

      This is usually used to flag an .identify() call to just update the traits but not “last seen.”

    • channel string
      The channel the event originated from.

      Accepted values:browser,server,mobile

    • ip string
      The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
    • locale string
      The local string for the current user, e.g. en-US.
    • userAgent string
      The user agent of the device making the request
      • content string
      • medium string
        The type of traffic a person/event originates from, like email, or referral.
      • name string
        The campaign name.
      • source string
        The source of traffic—like the name of your email list, Facebook, Google, etc.
      • term string
        The keyword term(s) a user came from.
      • Additional UTM Parameters* string
      • keywords array of [ strings ]
        A list/array of keywords describing the page’s content. The keywords are likely the same as, or similar to, the keywords you would find in an HTML meta tag for SEO purposes. This property is mainly used by content publishers that rely heavily on pageview tracking. This isn’t automatically collected.
      • name string
        The name of the page. Reserved for future use.
      • path string
        The path portion of the page’s URL. Equivalent to the canonical path which defaults to location.pathname from the DOM API.
      • referrer string
        The previous page’s full URL. Equivalent to document.referrer from the DOM API.
      • search string
        The query string portion of the page’s URL. Equivalent to location.search from the DOM API.
      • title string
        The page’s title. Equivalent to document.title from the DOM API.
      • url string
        A page’s full URL. Segment first looks for the canonical URL. If the canonical URL is not provided, Segment uses location.href from the DOM API.
    • Enabled/Disabled integrations* boolean
  • messageId string
    A unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
  • name string
    Required The name of the page.
    • category string
      The category of the page. This might be useful if you have a single page routes or have a flattened URL structure.
    • Page Properties* any
      Additional properties tha tyou want to send with the page event. By default, we capture `url`, `title`, and stuff.
  • receivedAt string  (date-time)
    The ISO-8601 timestamp when Data Pipelines receives an event.
  • sentAt string  (date-time)
    The ISO-8601 timestamp when a library sends an event to Data Pipelines.
  • timestamp string  (date-time)
    The ISO-8601 timestamp when the event originally took place. This is mostly useful when you backfill data past events. If you’re not backfilling data, you can leave this field empty and we’ll use the current time or server time.
  • type string
    Required The event type. This is set automatically by the request method/endpoint.

    Accepted values:page

  • userId string
    Required The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.
  • version number
    The version of the API that received the event, automatically set by Customer.io.

Screen

The Screen method sends screen view events for mobile devices. These help you understand the screens that people use in your app.

SELECT id AS userId, screen_name as name, session_started
FROM screens
WHERE timestamp > {{last_sync_time}}

In most cases, you’ll simply provide an ID for the user and the screen name, but we’ve provided the full list of writable-fields below. You’ll find the complete source payload in our API documentation.

  • name string
    The name of the screen the person visited.
    • Screen Properties* any
  • type string
    The event type. This is set automatically by the request method/endpoint.

    Accepted values:screen

  • userId string
    Required The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.
      • Enabled/Disabled integrations* boolean
    • messageId string
      A unique identifier for a Data Pipelines call, ensuring that each individual event is unique. This is set by Customer.io
    • receivedAt string  (date-time)
      The ISO-8601 timestamp when Data Pipelines receives an event.
    • sentAt string  (date-time)
      The ISO-8601 timestamp when a library sends an event to Data Pipelines.
    • timestamp string  (date-time)
      The ISO-8601 timestamp when the event originally took place. This is mostly useful when you backfill data past events. If you’re not backfilling data, you can leave this field empty and we’ll use the current time or server time.
    • type string
      The type of source event. This is implicit and set by Customer.io based on the endpoint/method you use (e.g. identify).

      Accepted values:identify,group,track,page,screen,alias

    • version number
      The version of the API that received the event, automatically set by Customer.io.
      • active boolean

        Whether a user is active.

        This is usually used to flag an .identify() call to just update the traits but not “last seen.”

      • channel string
        The channel the event originated from.

        Accepted values:browser,server,mobile

      • ip string
        The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
      • locale string
        The local string for the current user, e.g. en-US.
      • userAgent string
        The user agent of the device making the request
        • build string
          The specific build number in the app.
        • name string
          The name of the app.
        • namespace string
          The app’s namespace.
        • version string
          The version of the app the call originated from.
        • advertisingId string
          The advertising ID is a unique, anonymous ID for advertising.
        • id string
          The device ID.
        • manufacturer string
          The device manufacturer.
        • model string
          The device model.
        • name string
          The device name.
        • type string
          The device type—android, iOS, etc.

          Accepted values:android,ios

        • version string
          The firmware version for the device.
        • bluetooth boolean
          Lets you know if bluetooth is enabled on a device.
        • carrier string
          The cellular carrier the phone uses.
        • cellular boolean
          Indicates whether the device’s cellular connection is enabled or not.
        • wifi boolean
          Indicates whether a device’s wifi connection is enabled or not.
        • name string
          The operating system running on the device.
        • version string
          The version of the OS running on the device.

Alias

The Alias method combines two previously unassociated user identities. Some destinations automatically reconcile profiles with different identifiers based on whether you send anonymousId, userId, or another trait that the destination expects to be unique. But for destinations that don’t, you may need to send alias requests to do this.

In general, you won’t need to use the alias call; we try to handle user identification gracefully, so that you don’t need to merge profiles. But you may need to send alias calls to manage user identities in some destinations.

For example, in Mixpanel it’s used to associate an anonymous user with an identified user once they sign up.

SELECT id AS userId, old_id as previousId
FROM user_resolution
WHERE timestamp >= {{last_sync_time}}
  • previousId string
    Required The userId that you want to merge into the canonical profile.
  • userId string
    Required The userId that you want to keep. This is required if you haven’t already identified someone with one of our web or server-side libraries.
Copied to clipboard!
  Contents
Is this page helpful?