Yandex

Send Customer.io data about messages, people, metrics, etc to Yandex. From here, you can ingest your data into the data warehouse of your choosing. This integration syncs up to every 15 minutes, helping you keep up to date on your audience’s message activities.

How it works

This integration exports individual parquet files for Deliveries, Metrics, Subjects, Outputs, People, and Attributes to your storage bucket. Each parquet file contains data that changed since the last export.

Once the parquet files are in your storage bucket, you can import them into data platforms like Fivetran or data warehouses like Redshift, BigQuery, and Snowflake.

Note that this integration only publishes parquet files to your storage bucket. You must set your data warehouse to ingest this data. There are many approaches to ingesting data, but it typically requires a COPY command to load the parquet files from your bucket. After you load parquet files, you should set them to expire to delete them automatically.

We attempt to export parquet files every 15 minutes, though actual sync intervals and processing times may vary. When syncing large data sets, or Customer.io experiences a high volume of concurrent sync operations, it can take up to several hours to process and export data. This feature is not intended to sync data in real time.

sequenceDiagram participant a as Customer.io participant b as Storage Bucket participant c as Yandex loop up to every 15 minutes a->>b: export parquet files b->>c: ingest c->>b: expire/delete files
before next sync end

 Your initial sync includes historical data

During the first sync, you’ll receive a history of your Deliveries, Metrics, Subjects, and Outputs data. However, People who have been deleted or suppressed before the first sync are not included in the People file export and the historical data in the other export files is anonymized for the deleted and suppressed People.

The initial export vs incremental exports

Your initial sync is a set of files containing historical data to represent your workspace’s current state. Subsequent sync files contain changesets.

  • Metrics: The initial metrics sync is broken up into files with two sequence numbers, as follows. <name>_v3_<workspace_id>_<sequence1>_<sequence2>.
  • Attributes: The initial Attributes exports include a list of profiles and their current attributes. Subsequent files will only contain attribute changes, with one change per row.
flowchart LR a{is it the initial sync?}-->|yes|b[send all history] a-->|no|c{was the file
already enabled?} c-->|yes|d[send changes since last sync] c-->|no|e{was the file
ever enabled?} e-->|yes|f[send changeset since
file was disabled] e-->|no|g[send all history]

Set up an Amazon S3 data-out integration

Before you begin, make sure that you’re prepared to ingest relevant parquet files from Customer.io. For S3, you’ll need to set up your bucket with relevant permissions before you can sync data. Click here to see an example containing required S3 permissions.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "KMSPermissions",
            "Effect": "Allow",
            "Action": [
                "kms:GetPublicKey",
                "kms:ImportKeyMaterial",
                "kms:Decrypt",
                "kms:Verify",
                "kms:Encrypt",
                "kms:GenerateDataKey"
            ],
            "Resource": [
                "arn:aws:kms:us-west-2:111122223333:key/<key>"
            ]
        },
        {
            "Sid": "S3Permissions",
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket",
                "s3:ListBucketVersions",
                "s3:ListBucket",

                "s3:GetBucketAcl",
                "s3:GetBucketPolicy",
                "s3:GetBucketLocation",

                "s3:DeleteBucket",
                "s3:DeleteObjectTagging",

                "s3:GetObject",
                "s3:GetObjectTagging",

                "s3:PutObject",
                "s3:PutObjectTagging",
                "s3:DeleteObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::bucket/directory",
                "arn:aws:s3:::bucket/directory/*",
                "arn:aws:s3:::bucket"
            ]
        }
    ]
}
  1. Create an Access Key and a Service Key with read/write permissions to your S3 or Yandex bucket.
  2. Go to Integrations and select Yandex and then click Sync Bucket.
  3. Enter information about your bucket and click Select data.
    • Enter the Name of your bucket.
    • Enter the path to your bucket.
    • Paste your Access and Secret keys in the appropriate fields.
    • Select the Region your bucket is in.
dws-s3-setup.png
dws-s3-setup.png
  1. Select the data types that you want to export from Customer.io to your bucket. By default, we export all data types, but you can disable the types that you aren’t interested in.
  2. Click Create and sync data.

Pausing and resuming your sync

You can turn off files you no longer want to receive, or pause them momentarily as you update your integration, and turn them back on. When you turn a file schema on, we send files to catch you up from the last export.If you haven’t exported a particular file before—the file was never “on”—the initial sync contains your historical data.

You can also disable your entire sync, in which case we’ll quit sending files all together. When you enable your sync again, we send all of your historical data as if you’re starting a new integration. Before you disable a sync, consider if you simply want to disable individual files and resume them later.

 Delete old sync files before you re-enable a sync

Before you resume a sync that you previously disabled, you should clear any old files from your storage bucket so that there’s no confusion between your old files and the files we send with the re-enabled sync.

Disabling and enabling individual export files

  1. Go to Data & Integrations > Integrations and select Yandex.
    1. Select the files you want to turn on or off.

    When you enable a file, the next sync will contain baseline historical data catching up from your previous sync or the complete history if you haven’t synced a file before; subsequent syncs will contain changesets.

     Turning the People file off

    If you turn the People file off for more than 7 days, you will not be able to re-enable it. You’ll need to delete your sync configuration, purge all sync files from your destination storage bucket, and create a new sync to resume syncing people data.

    Turn files on or off
    Turn files on or off

Disabling your sync

If your sync is already disabled, you can enable it again with these instructions. But, before you re-enable your sync, you should clear the previous sync files from your data warehouse bucket first. See Pausing and resuming your sync for more information.

  1. Go to Data & Integrations > Integrations and select Yandex.
  2. Click Disable Sync.

Manage your configuration

You can change settings for a bucket, if your path changes or you need to swap keys for security purposes.

  1. Go to Data & Integrations > Integrations and select Yandex.
  2. Click Manage Configuration for your bucket.
  3. Make your changes. No matter your changes, you must input your Service Account Key (GCS) or Secret Key (S3, Yandex) again.
  4. Click Update Configuration. Subsequent syncs will use your new configuration.
Edit your data warehouse bucket configuration
Edit your data warehouse bucket configuration

Update sync schema version

Before you prepare to update your data warehouse sync version, see the changelog below. You may need to update your ingestion logic to support updated schemas, filenames, and new data types.

 When updating from v1 to a later version, you must:

  • Update ingestion logic to accept the new file name format: <name>_v<x>_<workspace_id>_<sequence>.parquet
  • Delete existing rows in your Subjects and Outputs tables. When you update, we send all of your Subjects and Outputs data from the beginning of your history using the new file schema.

dws-upgrade-version.png
dws-upgrade-version.png
  1. Go to Data & Integrations > Integrations and select Yandex.
  2. Click Upgrade Schema Version.
  3. Follow the instructions to make sure that your ingestion logic is updated accordingly.
  4. Confirm that you’ve made the appropriate pages and click Upgrade sync. The next sync uses the updated schema version.

Parquet file schemas

This section describes the different kinds of files you can export from our Database-out integrations. Many schemas include an internal_customer_id—this is the cio_idAn identifier for a person that is automatically generated by Customer.io and cannot be changed. This identifier provides a complete, unbroken record of a person across changes to their other identifiers (id, email, etc).. You can use it to resolve a person associated with a subject, delivery, etc.

These schemas represent the latest versions available. Check out our changelog for information about earlier versions.

Deliveries are individual Email, push, SMS, slack, and webhook records sent from your workspace. The first deliveries export file includes baseline historical data. Subsequent files contain rows for data that changed since the last export.

Field NamePrimary KeyForeign KeyDescription
workspace_idINTEGER (Required). The ID of the Customer.io workspace associated with the delivery record.
delivery_idSTRING (Required). The ID of the delivery record.
internal_customer_idPeopleSTRING (Nullable). The cio_id of the person in question. Use the people parquet file to resolve this ID to an external customer_id or email address.
subject_idSubjectsSTRING (Nullable). If the delivery was created as part of a Campaign or API Triggered Broadcast workflow, this is the ID for the path the person went through in the workflow.
event_idMetricsSTRING (Nullable). If the delivery was created as part of an event-triggered Campaign, this is the ID for the unique event that triggered the workflow.
delivery_typeSTRING (Required). The type of delivery: email, push, sms, slack, or webhook.
campaign_idINTEGER (Nullable). If the delivery was created as part of a Campaign or API Triggered Broadcast workflow, this is the ID for the Campaign or API Triggered Broadcast.
action_idINTEGER (Nullable). If the delivery was created as part of a Campaign or API Triggered Broadcast workflow, this is the ID for the unique workflow item that caused the delivery to be created.
newsletter_idINTEGER (Nullable). If the delivery was created as part of a Newsletter, this is the unique ID of that Newsletter.
content_idINTEGER (Nullable). If the delivery was created as part of a Newsletter split test, this is the unique ID of the Newsletter variant.
trigger_idINTEGER (Nullable). If the delivery was created as part of an API Triggered Broadcast, this is the unique trigger ID associated with the API call that triggered the broadcast.
created_atTIMESTAMP (Required). The timestamp the delivery was created at.
transactional_message_idINTEGER (Nullable). If the delivery occurred as a part of a transactional message, this is the unique identifier for the API call that triggered the message.
seq_numINTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

Metrics exports detail events relating to deliveries (e.g. messages sent, opened, etc). Your initial metrics export contains baseline historical data, broken up into files with two sequence numbers, as follows: <name>_v3_<workspace_id>_<sequence1>_sequence2>.

Subsequent files contain rows for data that changed since the last export.

Field NamePrimary KeyForeign KeyDescription
event_idSTRING (Required). The unique ID of the metric event. This can be useful for deduplicating purposes.
workspace_idINTEGER (Required). The ID of the Customer.io workspace associated with the metric record.
delivery_idDeliveriesSTRING (Required). The ID of the delivery record.
metricSTRING (Required). The type of metric (e.g. sent, delivered, opened, clicked).
reasonSTRING (Nullable). For certain metrics (e.g. attempted), the reason behind the action.
link_idINTEGER (Nullable). For "clicked" metrics, the unique ID of the link being clicked.
link_urlSTRING (Nullable). For "clicked" metrics, the URL of the clicked link. (Truncated to 1000 bytes.)
created_atTIMESTAMP (Required). The timestamp the metric was created at.
seq_numINTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

Outputs are the unique steps within each workflow journey. The first outputs file includes historical data. Subsequent files contain rows for data that changed since the last export.

 Outputs are not available in v1

Upgrade to v2 or later to get outputs data.

Field NamePrimary KeyForeign KeyDescription
workspace_idINTEGER (Required). The ID of the Customer.io workspace associated with the output record.
output_idSTRING (Required). The ID for the step of the unique path a person went through in a Campaign or API Triggered Broadcast workflow.
subject_idSubjectsSTRING (Required). The ID for the path a person went through in a Campaign or API Triggered Broadcast workflow.
output_typeSTRING (Required). The type of step a person went through in a Campaign or API Triggered Broadcast workflow. Note that the "delay" output_type covers many use cases: a Time Delay or Time Window workflow item, a "grace period", or a Date-based campaign trigger.
action_idINTEGER (Required). The ID for the unique workflow item associated with the output.
explanationSTRING (Required). The explanation for the output.
delivery_idDeliveriesSTRING (Nullable). If a delivery resulted from this step of the workflow, this is the ID of that delivery.
draftBOOLEAN (Nullable). If a delivery resulted from this step of the workflow, this indicates whether the delivery was created as a draft.
link_trackedBOOLEAN (Nullable). If a delivery resulted from this step of the workflow, this indicates whether links within the delivery are configured for tracking.
split_test_indexINTEGER (Nullable). If the step of the workflow was a Split Test, this indicates the variant of the Split Test.
delay_ends_atTIMESTAMP (Nullable). If the step of the workflow involves a delay, this is the timestamp for when the delay will end.
branch_indexINTEGER (Nullable). If the step of the workflow was a T/F Branch, a Multi-Split Branch, or a Random Cohort Branch, this indicates the branch that was followed.
manual_segment_idINTEGER (Nullable). If the step of the workflow was a Manual Segment Update, this is the ID of the Manual Segment involved.
add_to_manual_segmentBOOLEAN (Nullable). If the step of the workflow was a Manual Segment Update, this indicates whether a person was added or removed from the Manual Segment involved.
created_atTIMESTAMP (Required). The timestamp the output was created at.
seq_numINTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

The first People export file includes a list of current people at the time of your first sync (deleted or suppressed people are not be included in the first file). Subsequent exports include people who were created, deleted, or suppressed since the last export.

As of v3, people exports come in two different files:

  • people_v3_<env>_<seq>.parquet: Contains new people.
  • people_v3_chngs_<env>_<seq>.parquet: Contains changes to people since the previous sync.

These files have an identical structure and a part of the same data set. You should import them to the same table.

Field NamePrimary KeyForeign KeyDescription
workspace_idINTEGER (Required). The ID of the Customer.io workspace associated with the person.
customer_idSTRING (Required). The ID of the person in question. This will match the ID you see in the Customer.io UI.
internal_customer_idSTRING (Required). The cio_id of the person in question. Use the people parquet file to resolve this ID to an external customer_id or email address.
deletedBOOLEAN (Nullable). This indicates whether the person has been deleted.
suppressedBOOLEAN (Nullable). This indicates whether the person has been suppressed.
created_atTIMESTAMP (Required). The timestamp the profile was created at. Note that this is the timestamp the profile was created in Customer.io's database, which may not match the created_at profile attribute.
updated_atTIMESTAMP (Required) The date-time when a person was updated. Use the most recent updated_at value for a customer_id to disambiguate between multiple records.
email_addrSTRING (Optional) The email address of the person. For workspaces using email as a unique identifier, this value may be the same as the customer_id.

Subjects are the unique workflow journeys that people take through Campaigns and API Triggered Broadcasts. The first subjects export file includes baseline historical data. Subsequent files contain rows for data that changed since the last export.

 Subjects are not available in v1

Upgrade to v2 or later to get subjects data.

Field NamePrimary KeyForeign KeyDescription
workspace_idINTEGER (Required). The ID of the Customer.io workspace associated with the subject record.
subject_idSTRING (Required). The ID for the path a person went through in a Campaign or API Triggered Broadcast workflow.
subject_nameSTRING (Required). A secondary unique ID for the path a person took through a campaign or broadcast workflow.
internal_customer_idPeopleSTRING (Nullable). The cio_id of the person in question. Use the people parquet file to resolve this ID to an external customer_id or email address.
campaign_typeSTRING (Required). The type of Campaign (segment, event, or triggered_broadcast)
campaign_idINTEGER (Required). The ID of the Campaign or API Triggered Broadcast.
event_idMetricsSTRING (Nullable). The ID for the unique event that triggered the workflow.
trigger_idINTEGER (Optional). If the delivery was created as part of an API Triggered Broadcast, this is the unique trigger ID associated with the API call that triggered the broadcast.
started_campaign_atTIMESTAMP (Required). The timestamp when the person first matched the campaign trigger. For event-triggered campaigns, this is the timestamp of the trigger event. For segment-triggered campaigns, this is the time the user entered the segment.
created_atTIMESTAMP (Required). The timestamp the subject was created at.
seq_numINTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

Attribute exports represent changes to people (by way of their attribute values) over time. The initial Attributes export includes a list of profiles and their current attributes. Subsequent files contain attribute changes, with one change per row.

Field NamePrimary KeyForeign KeyDescription
workspace_idINTEGER (Required). The ID of the Customer.io workspace associated with the person.
internal_customer_idSTRING (Required). The cio_id of the person in question. Use the people parquet file to resolve this ID to an external customer_id or email address.
attribute_nameSTRING (Required). The attribute that was updated.
attribute_valueSTRING (Required). The new value of the attribute.
timestampTIMESTAMP (Required). The timestamp of the attribute update.
Copied to clipboard!