Snowflake Reverse ETL Source
Import data from Snowflake to any destination. This source saves you the trouble of writing code to extract, transform, and load data from your warehouse. Instead, specify the query (or queries) to run, and we’ll handle the rest.
Best Practices
Before you add this integration, you should take some measures to ensure the security of your customers’ data and limit performance impacts to your database. The following “best practice” suggestions can help you limit the potential for data exposure and minimize performance impacts.
Create a new database user/service account. You should have a database user with minimal privileges specifically for Customer.io import/sync operations. This person only requires read permissions with access limited to the tables you want to sync from.
Avoid using your main database instance. You may want to create a read-only database instance with replication in place, lightening the load and preventing data loss on your main instance.
Sync only the data that you’ll use in Customer.io. Limiting your query can improve performance, and minimizes the potential to expose sensitive data. Select only the columns you care about, and make sure you use the
{{last_sync_time}}
to limit your query to data that changed since the previous sync.Limit your sync frequency so that syncs don’t overlap and you don’t cause extra database traffic. You should monitor your first few syncs to ensure that you don’t impact your system’s security and performance.
Set up a Snowflake connector
When you set up a Snowflake source, you’ll need to set up a Snowflake connector that Customer.io will use to connect to Snowflake. We recommend that you use the ACCOUNTADMIN
role to execute all the commands below.
Log in to your Snowflake account and go to Worksheets.
Run this code to create a database. We use the database specified in your connection settings to create a schema called
__customerio_reverse_etl
to avoid collisions with your data. The schema tracks changes to your model query results between syncs.You can reuse an existing database if you want. We recommend that you use the same database across all your syncs attached to this source to keep all your “state tracking” tables in one place.
-- not required if another database is being reused CREATE DATABASE customerio_reverse_etl;
Run this code to create a virtual warehouse. We need to execute queries on your Snowflake account, which requires a Virtual Warehouse to handle the compute. You can also reuse an existing warehouse.
-- not required if reusing another warehouse CREATE WAREHOUSE customerio_reverse_etl WITH WAREHOUSE_SIZE = 'XSMALL' WAREHOUSE_TYPE = 'STANDARD' AUTO_SUSPEND = 600 -- 5 minutes AUTO_RESUME = TRUE;
Run this code to create specific roles for Customer.io. Snowflake access is specified through roles, which you’ll assign to the user you’ll create later.
-- create role CREATE ROLE customerio_reverse_etl; -- warehouse access GRANT USAGE ON WAREHOUSE customerio_reverse_etl TO ROLE customerio_reverse_etl; -- database access GRANT USAGE ON DATABASE customerio_reverse_etl TO ROLE customerio_reverse_etl; GRANT CREATE SCHEMA ON DATABASE customerio_reverse_etl TO ROLE customerio_reverse_etl;
Run this code to create the username and password combination that Customer.io will use to execute queries. Make sure to enter your password where it says
my_strong_password
.-- create user CREATE USER customerio_reverse_etl_user MUST_CHANGE_PASSWORD = FALSE DEFAULT_ROLE = customerio_reverse_etl PASSWORD = 'my_strong_password'; -- Do not use this password -- role access GRANT ROLE customerio_reverse_etl TO USER customerio_reverse_etl_user;
Add your Snowflake source
As a part of this setup, you’ll provide Customer.io with user credentials that we’ll use to query your database. We recommend that you create a new user with Read Only access specifically for Customer.io, so you can manage Customer.io access to your database independent of any other Microsoft SQL users you have.
Make sure your database allows connections from 34.29.50.4. This is the IP address of our reverse ETL service. If this address is blocked by a firewall or blocklist, we won’t be able to connect to your database.
In the Data Pipelines tab, click Sources.
Click Add Source and select Snowflake.
Provide your database information, including credentials to connect to your database, and click Save database.
Set up a Sync. A sync is the type of source data (
identify
,track
, etc) you want to import from your database and click Next: Define Query. You can set up syncs for each type of data you want to import for your source.- Provide a Name and Description for the sync. This helps you understand the sync at a glance when you look at your source Overview later.
- Select the type of data you want to import.
- Set the Sync Frequency, indicating how often you want to query your database for new data. You should set the frequency such that sync operations don’t overlap. Learn more about sync frequency.
- Select when you want to start the sync: whether you want to begin importing data immediately, or schedule the sync to start at a later date.
Enter the query that selects the data you want to import. See Queries below for more information about the information you’ll want to select for your sync. Click Run Query to preview results and make sure that your query selects the right information.
Click Enable to enable your sync.
Now you can set up additional syncs and connect your source to one or more destinations.
Adding syncs
After you set up your source, you can add additional syncs to import different types of data from your database. For example, you might want to import identify
data for your users, and track
data for their actions. Subsequent syncs can rely on your existing database, or you can add another database within your source.
- In your source, go to the Syncs tab and click Add Sync.
- Select your database or add a new one and click Next: Create Sync.
- Set up a syncA sync is the type of source data (
identify
,track
, etc) you want to import from your database. A sync is essentially the type of source call you want to make. and click Next: Define Query. You can set up syncs for each type of data you want to import.- Provide a Name and Description for the sync. This helps you understand the sync at a glance when you look at your source Overview later.
- Select the type of data you want to import.
- Set the Sync Frequency, indicating how often you want to query your database for new data. You should set the frequency such that sync operations don’t overlap. Learn more about sync frequency.
- Select when you want to start the sync: whether you want to begin importing data immediately, or schedule the sync to start at a later date.
- Enter the query that selects the data you want to import. See Queries below for more information about the information you’ll want to select for your sync. Click Run Query to preview results and make sure that your query selects the right information.
- Click Enable to enable your sync.
Sync Frequency
You can sync data as often as every minute. However, we recommend that you set your sync frequency such that sync operations don’t overlap. If you schedule syncs such that a sync operation is scheduled to start while the previous operation is still we’ll skip the next sync operation.
Queries for each sync type
When you create a database sync, you provide a query selecting the people or objects you want to import, and respective attributes. You can query your storage bucket using the same methods in our Data Pipelines API.
When you create a database sync, your query has to account for required fields in a source call. Each query is based around our Data Pipelines APIs. Each row returned from your query represents an individual source operation/API call. Columns represent the traits or properties that you want to apply to the person, group, or event that your sync imports.
See the individual queries below for required fields and examples. Your query cannot return more than 40,000,000 rows and 300 columns.
Identify
The identify
method tells Data Pipelines who the current website visitor is, and lets you assign unique traitsA key-value pair that you associate with a person or an object—like a person’s name, the date they were created in your workspace, or a company’s billing date etc. Use attributes to target people and personalize messages. to a person.
You should call identify
when a user creates an account, logs in, etc. You can also call it again whenever a person’s traits change. We’ve shown a typical call with a traits
object, but we’ve listed all the fields available in an identify
call below.
You can send an identify call with an anonymousId
and/or userId
.
anonymousId
only: This assign traits to a person before you know who they are.userId
only: Identifies a user and sets traits.- both
userId
andanonymousId
: Associates the data sent in previous anonymouspage
,track
, andidentify
calls with the person you identify byuserId
.
SELECT id AS userId, email_address AS email, fname, lname, msisdn AS phone
FROM users
WHERE last_updated >= {{last_sync_time}}
In most cases, you’ll simply provide userId
, anonymousId
, and traits
, but we’ve provided the full list of writable-fields below. You’ll find the complete source payload in our API documentation.
- anonymousId stringA unique substitute for a User ID in cases when you don’t have an absolutely unique identifier. Our libraries generate this value automatically to help you track people before they sign up, log in, provide their email, etc.
-
- active boolean
Whether a user is active.
This is usually used to flag an .identify() call to just update the traits but not “last seen.”
- channel stringThe channel the event originated from.
Accepted values:
browser
,server
,mobile
- ip stringThe user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale stringThe local string for the current user, e.g.
en-US
. - userAgent stringThe user agent of the device making the request
-
- content string
- medium stringThe type of traffic a person/event originates from, like
email
, orreferral
. - name stringThe campaign name.
- source stringThe source of traffic—like the name of your email list, Facebook, Google, etc.
- term stringThe keyword term(s) a user came from.
- Additional UTM Parameters* string
-
- keywords array of [ strings ]A list/array of keywords describing the page’s content. The keywords are likely the same as, or similar to, the keywords you would find in an HTML
meta
tag for SEO purposes. This property is mainly used by content publishers that rely heavily on pageview tracking. This isn’t automatically collected. - name stringThe name of the page. Reserved for future use.
- path stringThe path portion of the page’s URL. Equivalent to the canonical
path
which defaults tolocation.pathname
from the DOM API. - referrer stringThe previous page’s full URL. Equivalent to
document.referrer
from the DOM API. - search stringThe query string portion of the page’s URL. Equivalent to
location.search
from the DOM API. - title stringThe page’s title. Equivalent to
document.title
from the DOM API. - url stringA page’s full URL. Segment first looks for the canonical URL. If the canonical URL is not provided, Segment uses
location.href
from the DOM API.
-
- Enabled/Disabled integrations* boolean
- messageId stringA unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
- receivedAt string (date-time)The ISO-8601 timestamp when Data Pipelines receives an event.
- sentAt string (date-time)The ISO-8601 timestamp when a library sends an event to Data Pipelines.
- timestamp string (date-time)The ISO-8601 timestamp when the event originally took place. This is mostly useful when you backfill data past events. If you’re not backfilling data, you can leave this field empty and we’ll use the current time or server time.
-
- createdAt string (date-time)We recommend that you pass date-time values as ISO 8601 date-time strings. We convert this value to fit destinations where appropriate.
- email stringA person’s email address. In some cases, you can pass an empty
userId
and we’ll use this value to identify a person. - Additional Traits* anyTraits that you want to set on a person. These can take any JSON shape.
- type stringThe event type. This is set automatically by the request method/endpoint.
Accepted values:
identify
- userId stringThe unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.
- version numberThe version of the API that received the event, automatically set by Customer.io.
Track
The track
method tells us about actions people take—the events people perform—on your site. Every track
call represents an event. When you send an event with a track
call, you can provide associated properties about the event. For more information, see the track method reference.
You should track events that are indicators on your site, or the kinds of things that you want to know that your audience does on your website, like Video Viewed, Item Purchased or Article Bookmarked. In some products—like Customer.io Journeys—you can use events to trigger messages and other downstream actions.
You can send events with an anonymousId
or a userId
. Calls that you make with an anonymousId
are associated with a userId
when you identify
someone by their userId
.
Track calls typically contain an event
name—so you know what a user did—and a series of properties
, which contain additional information about the event. The examples below show added_to_cart
events, containing properties that let you know about the product
a person added to their shopping cart.
SELECT id AS userId, event_name AS name, products, total_price AS value
FROM events
WHERE timestamp > {{last_sync_time}}
In most cases, you’ll simply provide an ID for the user, the event
, and properties
, but we’ve provided the full list of writable-fields below. You’ll find the complete source payload in our API documentation.
-
- active boolean
Whether a user is active.
This is usually used to flag an .identify() call to just update the traits but not “last seen.”
- channel stringThe channel the event originated from.
Accepted values:
browser
,server
,mobile
- ip stringThe user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale stringThe local string for the current user, e.g.
en-US
. - userAgent stringThe user agent of the device making the request
-
- content string
- medium stringThe type of traffic a person/event originates from, like
email
, orreferral
. - name stringThe campaign name.
- source stringThe source of traffic—like the name of your email list, Facebook, Google, etc.
- term stringThe keyword term(s) a user came from.
- Additional UTM Parameters* string
-
- keywords array of [ strings ]A list/array of keywords describing the page’s content. The keywords are likely the same as, or similar to, the keywords you would find in an HTML
meta
tag for SEO purposes. This property is mainly used by content publishers that rely heavily on pageview tracking. This isn’t automatically collected. - name stringThe name of the page. Reserved for future use.
- path stringThe path portion of the page’s URL. Equivalent to the canonical
path
which defaults tolocation.pathname
from the DOM API. - referrer stringThe previous page’s full URL. Equivalent to
document.referrer
from the DOM API. - search stringThe query string portion of the page’s URL. Equivalent to
location.search
from the DOM API. - title stringThe page’s title. Equivalent to
document.title
from the DOM API. - url stringA page’s full URL. Segment first looks for the canonical URL. If the canonical URL is not provided, Segment uses
location.href
from the DOM API.
- event stringRequired The name of the event
-
- Enabled/Disabled integrations* boolean
- messageId stringA unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
-
- Event Properties* anyAdditional properties that you want to capture in the event. These can take any JSON shape.
- receivedAt string (date-time)The ISO-8601 timestamp when Data Pipelines receives an event.
- sentAt string (date-time)The ISO-8601 timestamp when a library sends an event to Data Pipelines.
- timestamp string (date-time)The ISO-8601 timestamp when the event originally took place. This is mostly useful when you backfill data past events. If you’re not backfilling data, you can leave this field empty and we’ll use the current time or server time.
- type stringRequired The event type. This is set automatically by the request method/endpoint.
Accepted values:
track
- userId stringRequired The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.
- version numberThe version of the API that received the event, automatically set by Customer.io.
Group
The Group method associates an identified person with a group—like a company, organization, project, online class or any other collective noun you come up with for the same concept. In Customer.io Journeys, we call groups objectsNot to be confused with a JSON object, an object in Customer.io is a non-person entity that you can associate with one or more people—like a company, account, or online course. You can use objects to message people based on changes to their company, account, or colege course itinerary..
Group calls are useful for destinations where you maintain relationships between people and larger organizations, like in Customer.io! In Customer.io Journeys, you can store groups as objectsNot to be confused with a JSON object, an object in Customer.io is a non-person entity that you can associate with one or more people—like a company, account, or online course. You can use objects to message people based on changes to their company, account, or colege course itinerary., and trigger campaigns based on a person’s relationship to an object—like an account, online class, and so on.
Find more details about group
, including the group
payload, in our API spec.
Remember, group calls represent both an organization/group and relationships with users (by userId
). Your query should include not only the groupId
, but the userId
so that you can capture relationships at your destinations.
SELECT companyId AS groupId, companyname, employees, personId AS userId
FROM companies
WHERE last_updated >= {{last_sync_time}}
In most cases, you’ll simply provide an ID for the user, the groupId
, and traits
, but we’ve provided the full list of writable-fields below. You’ll find the complete source payload in our API documentation.
-
- active boolean
Whether a user is active.
This is usually used to flag an .identify() call to just update the traits but not “last seen.”
- channel stringThe channel the event originated from.
Accepted values:
browser
,server
,mobile
- ip stringThe user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale stringThe local string for the current user, e.g.
en-US
. - userAgent stringThe user agent of the device making the request
-
- content string
- medium stringThe type of traffic a person/event originates from, like
email
, orreferral
. - name stringThe campaign name.
- source stringThe source of traffic—like the name of your email list, Facebook, Google, etc.
- term stringThe keyword term(s) a user came from.
- Additional UTM Parameters* string
-
- keywords array of [ strings ]A list/array of keywords describing the page’s content. The keywords are likely the same as, or similar to, the keywords you would find in an HTML
meta
tag for SEO purposes. This property is mainly used by content publishers that rely heavily on pageview tracking. This isn’t automatically collected. - name stringThe name of the page. Reserved for future use.
- path stringThe path portion of the page’s URL. Equivalent to the canonical
path
which defaults tolocation.pathname
from the DOM API. - referrer stringThe previous page’s full URL. Equivalent to
document.referrer
from the DOM API. - search stringThe query string portion of the page’s URL. Equivalent to
location.search
from the DOM API. - title stringThe page’s title. Equivalent to
document.title
from the DOM API. - url stringA page’s full URL. Segment first looks for the canonical URL. If the canonical URL is not provided, Segment uses
location.href
from the DOM API.
- groupId stringRequired ID of the group
-
- Enabled/Disabled integrations* boolean
- messageId stringA unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
- receivedAt string (date-time)The ISO-8601 timestamp when Data Pipelines receives an event.
- sentAt string (date-time)The ISO-8601 timestamp when a library sends an event to Data Pipelines.
- timestamp string (date-time)The ISO-8601 timestamp when the event originally took place. This is mostly useful when you backfill data past events. If you’re not backfilling data, you can leave this field empty and we’ll use the current time or server time.
-
- Group Traits* any
- type stringRequired The event type. This is set automatically by the request method/endpoint.
Accepted values:
group
- userId stringRequired The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.
- version numberThe version of the API that received the event, automatically set by Customer.io.
Page
The Page method records page views on your website, along with optional extra information about the page a person visited.
SELECT id AS userId, metatitle as name, url, time_on_page
FROM pages
WHERE timestamp > {{last_sync_time}}
In most cases, you’ll simply provide an ID for the user and the page name
, but we’ve provided the full list of writable-fields below. You’ll find the complete source payload in our API documentation.
-
- active boolean
Whether a user is active.
This is usually used to flag an .identify() call to just update the traits but not “last seen.”
- channel stringThe channel the event originated from.
Accepted values:
browser
,server
,mobile
- ip stringThe user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale stringThe local string for the current user, e.g.
en-US
. - userAgent stringThe user agent of the device making the request
-
- content string
- medium stringThe type of traffic a person/event originates from, like
email
, orreferral
. - name stringThe campaign name.
- source stringThe source of traffic—like the name of your email list, Facebook, Google, etc.
- term stringThe keyword term(s) a user came from.
- Additional UTM Parameters* string
-
- keywords array of [ strings ]A list/array of keywords describing the page’s content. The keywords are likely the same as, or similar to, the keywords you would find in an HTML
meta
tag for SEO purposes. This property is mainly used by content publishers that rely heavily on pageview tracking. This isn’t automatically collected. - name stringThe name of the page. Reserved for future use.
- path stringThe path portion of the page’s URL. Equivalent to the canonical
path
which defaults tolocation.pathname
from the DOM API. - referrer stringThe previous page’s full URL. Equivalent to
document.referrer
from the DOM API. - search stringThe query string portion of the page’s URL. Equivalent to
location.search
from the DOM API. - title stringThe page’s title. Equivalent to
document.title
from the DOM API. - url stringA page’s full URL. Segment first looks for the canonical URL. If the canonical URL is not provided, Segment uses
location.href
from the DOM API.
-
- Enabled/Disabled integrations* boolean
- messageId stringA unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
- name stringRequired The name of the page.
-
- category stringThe category of the page. This might be useful if you have a single page routes or have a flattened URL structure.
- Page Properties* anyAdditional properties tha tyou want to send with the page event. By default, we capture `url`, `title`, and stuff.
- receivedAt string (date-time)The ISO-8601 timestamp when Data Pipelines receives an event.
- sentAt string (date-time)The ISO-8601 timestamp when a library sends an event to Data Pipelines.
- timestamp string (date-time)The ISO-8601 timestamp when the event originally took place. This is mostly useful when you backfill data past events. If you’re not backfilling data, you can leave this field empty and we’ll use the current time or server time.
- type stringRequired The event type. This is set automatically by the request method/endpoint.
Accepted values:
page
- userId stringRequired The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.
- version numberThe version of the API that received the event, automatically set by Customer.io.
Screen
The Screen method sends screen view events for mobile devices. These help you understand the screens that people use in your app.
SELECT id AS userId, screen_name as name, session_started
FROM screens
WHERE timestamp > {{last_sync_time}}
In most cases, you’ll simply provide an ID for the user and the screen name
, but we’ve provided the full list of writable-fields below. You’ll find the complete source payload in our API documentation.
- name stringThe name of the screen the person visited.
-
- Screen Properties* any
- type stringThe event type. This is set automatically by the request method/endpoint.
Accepted values:
screen
- userId stringRequired The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.
-
- Enabled/Disabled integrations* boolean
- messageId stringA unique identifier for a Data Pipelines call, ensuring that each individual event is unique. This is set by Customer.io
- receivedAt string (date-time)The ISO-8601 timestamp when Data Pipelines receives an event.
- sentAt string (date-time)The ISO-8601 timestamp when a library sends an event to Data Pipelines.
- timestamp string (date-time)The ISO-8601 timestamp when the event originally took place. This is mostly useful when you backfill data past events. If you’re not backfilling data, you can leave this field empty and we’ll use the current time or server time.
- type stringThe type of source event. This is implicit and set by Customer.io based on the endpoint/method you use (e.g.
identify
).Accepted values:
identify
,group
,track
,page
,screen
,alias
- version numberThe version of the API that received the event, automatically set by Customer.io.
-
- active boolean
Whether a user is active.
This is usually used to flag an .identify() call to just update the traits but not “last seen.”
- channel stringThe channel the event originated from.
Accepted values:
browser
,server
,mobile
- ip stringThe user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale stringThe local string for the current user, e.g.
en-US
. - userAgent stringThe user agent of the device making the request
-
- build stringThe specific build number in the app.
- name stringThe name of the app.
- namespace stringThe app’s namespace.
- version stringThe version of the app the call originated from.
-
- advertisingId stringThe advertising ID is a unique, anonymous ID for advertising.
- id stringThe device ID.
- manufacturer stringThe device manufacturer.
- model stringThe device model.
- name stringThe device name.
- type stringThe device type—android, iOS, etc.
Accepted values:
android
,ios
- version stringThe firmware version for the device.
-
- bluetooth booleanLets you know if bluetooth is enabled on a device.
- carrier stringThe cellular carrier the phone uses.
- cellular booleanIndicates whether the device’s cellular connection is enabled or not.
- wifi booleanIndicates whether a device’s wifi connection is enabled or not.
-
- name stringThe operating system running on the device.
- version stringThe version of the OS running on the device.
Alias
The Alias method combines two previously unassociated user identities. Some destinations automatically reconcile profiles with different identifiers based on whether you send anonymousId
, userId
, or another trait that the destination expects to be unique. But for destinations that don’t, you may need to send alias
requests to do this.
In general, you won’t need to use the alias
call; we try to handle user identification gracefully, so that you don’t need to merge profiles. But you may need to send alias
calls to manage user identities in some destinations.
For example, in Mixpanel it’s used to associate an anonymous user with an identified user once they sign up.
SELECT id AS userId, old_id as previousId
FROM user_resolution
WHERE timestamp >= {{last_sync_time}}
- previousId stringRequired The userId that you want to merge into the canonical profile.
- userId stringRequired The userId that you want to keep. This is required if you haven’t already identified someone with one of our web or server-side libraries.