Data Warehouses

Free ✓
Team ✓
Business ✓
Add-on x
?

Free and Team plan workspaces can have 1 warehouse. Business plans can have more than one, and include custom sync schedules and filtering.
See the available plans, or contact Support.

What’s a Warehouse?

A warehouse is a central repository of data collected from one or more sources. This is what commonly comes to mind when you think about a relational database: structured data that fits neatly into rows and columns.

In Segment, a Warehouse is a special type of destination. Instead of streaming data to the destination all the time, we load data to them in bulk at regular intervals. When we load data, we insert and update events and objects, and automatically adjust their schema to fit the data you’ve sent to Segment.

When selecting and building a data warehouse, consider three questions:

  1. What type of data will be collected?
  2. How many data sources will there be?
  3. How will the data be used?

Relational databases are great when you know and predefine the information collected and how it will be linked. This is usually the type of database used in the world of user analytics. For instance, a users table might be populated with the columns name, email_address, or plan_name.

Examples of data warehouses include Amazon Redshift, Google BigQuery, and Postgres.

When Segment loads data into your warehouse, each sync goes through two steps:

  1. Ping: Segment servers connect to your warehouse. For Redshift warehouses, Segment also runs a query to determine how many slices a cluster has. Common reasons a sync might fail at this step include a blocked VPN or IP, a warehouse that isn’t set to be publicly accessible, or an issue with user permissions or credentials.
  2. Load: Segment de-duplicates the transformed data and loads it into your warehouse. If you have queries set up in your warehouse, they run after the data is loaded into your warehouse.

Looking for the Warehouse Schemas docs?

They’ve moved: Warehouse Schemas.

More Help

How do I send custom data to my warehouse?

How do I give users permissions to my warehouse?

Check out the Frequently Asked Questions about Warehouses page and a list of helpful SQL queries to get you started with Redshift .

FAQs

How do I decide between Redshift, Postgres, and BigQuery?

What do you recommend for Postgres: Amazon or Heroku?

How do I give users permissions?

What are the limitations of Redshift clusters and warehouses connectors?

Where do I find my source slug?

Setting up a warehouse

How do I create a user, grant usage on a schema and then grant the privileges that the user will need to interact with that schema?

Which IPs should I allowlist?

Will Segment sync my historical data?

Can I load in my own data into my warehouse?

Can I control what data is sent to my warehouse?

Managing a warehouse

How fresh is the data in my warehouse?

Can I add, tweak, or delete some of the tables?

Can I transform or clean up old data to new formats or specs?

What are common errors and how do I debug them?

How do I speed up my Redshift queries?

Analyzing with SQL

How do I forecast LTV with SQL and Excel for e-commerce businesses?

How do I measure the ROI of my Marketing Campaigns?

This page was last modified: 12 Aug 2024



Get started with Segment

Segment is the easiest way to integrate your websites & mobile apps data to over 300 analytics and growth tools.
or
Create free account