Skip to main content

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

· 16 min read
Jason Ganz
Developer Experience at dbt Labs

dbt is the standard for creating governed, trustworthy datasets on top of your structured data. MCP is showing increasing promise as the standard for providing context to LLMs to allow them to function at a high level in real world, operational scenarios.

Today, we are open sourcing an experimental version of the dbt MCP server. We expect that over the coming years, structured data is going to become heavily integrated into AI workflows and that dbt will play a key role in building and provisioning this data.

In particular, we expect both Business Intelligence and Data Engineering will be driven by AI operating on top of the context defined in your dbt projects.

We are committed to building the data control plane that enables AI to reliably access structured data from across your entire data lineage. Over the coming months and years, data teams will increasingly focus on building the rich context that feeds into the dbt MCP server. Both AI agents and business stakeholders will then operate on top of LLM-driven systems hydrated by the dbt MCP context.

Today’s system is not a full realization of the vision in the posts shared above, but it is a meaningful step towards safely integrating your structured enterprise data into AI workflows. In this post, we’ll walk through what the dbt MCP server can do today, some tips for getting started and some of the limitations of the current implementation.

We believe it is important for the industry to start coalescing on best practices for safe and trustworthy ways to access your business data via LLM.

What is MCP?

MCP stands for Model Context Protocol - it is an open protocol released by Anthropic in November of last year to allow AI systems to dynamically pull in context and data. Why does this matter?

Even the most sophisticated models are constrained by their isolation from data—trapped behind information silos and legacy systems. Every new data source requires its own custom implementation, making truly connected systems difficult to scale.

MCP addresses this challenge. It provides a universal, open standard for connecting AI systems with data sources, replacing fragmented integrations with a single protocol. - Anthropic

Since then, MCP has become widely supported, with Google, Microsoft and OpenAI all committing to support MCP.

What does the dbt MCP Server do?

Think of it as the missing glue between:

  • Your dbt project (models, docs, lineage, Semantic Layer)
  • Any MCP‑enabled client (Claude Desktop Projects, Cursor, agent frameworks, custom apps, etc.)

We’ve known for a while that the combination of structured data from your dbt project + LLMs is a potent combo (particularly when using the dbt Semantic Layer). The question has been, what is the best way to provision this across a wide variety of LLM applications in a way that puts the power in the hands of the Community and the ecosystem, rather than us building out a series of one-off integrations.

The dbt MCP server provides access to a set of tools that operate on top of your dbt project. These tools can be called by LLM systems to learn about your data and metadata.

Remember, as with any AI workflows, to make sure that you are taking appropriate caution in terms of giving these access to production systems and data. Consider starting in a sandbox environment or only granting read permissions.

There are three primary functions of the dbt MCP server today.

Three use‑case pillars of the dbt MCP serverThree use‑case pillars of the dbt MCP server
  • Data discovery: Understand what data assets exist in your dbt project.
  • Data querying: Directly query the data in your dbt project. This has two components:
    • Use the dbt Semantic Layer for trustworthy, single source of truth reporting on your metrics
    • Execution of SQL queries for more freewheeling data exploration and development
  • Run and perform commands within dbt: Access the dbt CLI to run a project and perform other operations
How the dbt MCP server fits between data sources and MCP‑enabled clientsHow the dbt MCP server fits between data sources and MCP‑enabled clients

❓Do I need to be a dbt Cloud customer to use the dbt MCP server?

  • No - there is functionality for both dbt Cloud and dbt Core users included in the MCP. Over time, Cloud-specific services will be built into the MCP server where they provide differentiated value.

Let’s walk through examples of these and why each of them can be helpful in human driven and agent driven use cases:

Using the dbt MCP Server for Data Asset Discovery

dbt has knowledge about the data assets that exist across your entire data stack, from raw staging models to polished analytical marts. The dbt MCP server exposes this knowledge in a way that makes it accessible to LLMs and AI agents, enabling powerful discovery capabilities:

  • For human stakeholders: Learn about your production dbt project interactively through natural language. Business users can ask questions like "What customer data do we have?" or "Where do we store marketing spend information?" and receive accurate information based on your dbt project's documentation and structure.
  • For AI agent workflows: Automatically discover and understand the available data models, their relationships, and their structures without human intervention. This allows agents to autonomously navigate complex data environments and produce accurate insights. This can be useful context for any agent that needs to operate on top of information in a data platform.

The data discovery tools allow LLMs to understand what data exists, how it's structured, and how different data assets relate to each other. This contextual understanding is essential for generating accurate SQL, answering business questions, and providing trustworthy data insights.

Data Asset Discovery Tools:

note - for all of these tools, you do not need to directly access them in your workflow. Rather, the MCP client will use the context you have provided to determine which is the most accurate tool to use at a given time.

Tool NamePurposeOutput
get_all_modelsProvides a complete inventory of all models in the dbt project, regardless of typeList of all model names and their descriptions
get_mart_modelsIdentifies presentation layer models specifically designed for end-user consumptionList of mart model names and descriptions (models in the reporting layer)
get_model_detailsRetrieves comprehensive information about a specific modelCompiled SQL, description, column names, column descriptions, and column data types
get_model_parentsIdentifies upstream dependencies for a specific modelList of parent models that the specified model depends on

Using the dbt MCP server for querying data via the dbt Semantic Layer

The dbt Semantic Layer defines your organization's metrics and dimensions in a consistent, governed way. With the dbt MCP server, LLMs can understand and query these metrics directly, ensuring that AI-generated analyses are consistent with your organization's definitions.

  • For human stakeholders: Request metrics using natural language. Users can ask for "monthly revenue by region" and get accurate results that match your organization's standard metric definitions, with a higher baseline of accuracy than LLM generated SQL queries.
  • For AI agent workflows: As agentic systems take action in the real world over a longer time horizon, they will need ways to understand the underlying reality of your business. From feeding into deep research style reports to feeding operational agents, the dbt Semantic Layer can provide a trusted underlying interface for LLM systems.

By leveraging the dbt Semantic Layer through the MCP server, you ensure that LLM-generated analyses are based on rigorous definitions instantiated as code, flexibly available in any MCP-supported client.

Semantic Layer Tools:

Tool NamePurposeOutput
list_metricsProvides an inventory of all available metrics in the dbt Semantic LayerComplete list of metric names, types, labels, and descriptions
get_dimensionsIdentifies available dimensions for specified metricsList of dimensions that can be used to group/filter the specified metrics
query_metricsExecutes queries against metrics in the dbt Semantic LayerQuery results based on specified metrics, dimensions, and filters

Using the dbt MCP server for sql execution to power text to sql

While the dbt Semantic Layer provides a governed, metrics-based approach to data querying, there are many analytical needs that require more flexible, exploratory SQL queries. The dbt MCP server will soon include SQL validation and querying capabilities with rich context awareness.

  • For human stakeholders: Ask complex analytical questions that go beyond predefined metrics. Users can explore data freely while still benefiting from the LLM's understanding of their specific data models, ensuring that generated SQL is correct and optimized for your environment.
  • For AI agent workflows: Generate and validate SQL against your data models automatically. Agents can create and execute complex queries that adapt to schema changes, optimize for performance, and follow your organization's SQL patterns and conventions.

Unlike traditional SQL generation, queries created through the dbt MCP server will be aware of your specific data models, making them more accurate and useful for your particular environment. This capability is particularly valuable for data exploration, one-off analyses, and prototype development that might later be incorporated into your dbt project.

Currently SQL execution is managed through the dbt Show tool, over the near term we expect to release tooling that is more performant and fit to this precise use case.

Using the dbt MCP server for project execution

The dbt MCP server doesn't just provide access to data—it also allows LLMs and AI agents to interact directly with dbt, executing commands and managing your project.

  • For human stakeholders: Trigger dbt commands through conversational interfaces without CLI knowledge. Users can ask to "run the daily models" or "test the customer models" and get clear explanations of the results, including suggestions for fixing any issues that arise.
  • For AI agent workflows: Autonomously run dbt processes in response to events. Agents can manage project execution, automatically test and validate model changes, and even debug common issues without human intervention.

While the discovery and query tools operate on top of environments as the context source, these execution tools interact directly with the CLI, both dbt Core and the Cloud CLI.

Project Execution Tools

Tool NamePurposeOutput
buildExecutes the dbt build command to build the entire projectResults of the build process including success/failure status and logs
compileExecutes the dbt compile command to compile the project's SQLResults of the compilation process including success/failure status and logs
listLists all resources in the dbt projectStructured list of resources within the project
parseParses the dbt project filesResults of the parsing process including success/failure status and logs
runExecutes the dbt run command to run models in the projectResults of the run process including success/failure status and logs
testExecutes tests defined in the dbt projectResults of test execution including success/failure status and logs

Getting Started

The dbt MCP server is now available as an experimental release. To get started:

  1. Clone the repository from GitHub: dbt-labs/dbt-mcp
  2. Follow the installation instructions in the README
  3. Connect your dbt project and start exploring the capabilities

We're excited to see how the community builds with and extends the dbt MCP server. Whether you're building an AI-powered BI tool, an autonomous data agent, or just exploring the possibilities of LLMs in your data workflows, the dbt MCP server provides a solid foundation for bringing your dbt context to AI applications.

What is the best workflow for the current iteration of the MCP server?

This early release is primarily meant to be used on top of an existing dbt project to answer questions about your data and metadata - roughly tracking towards the set of use cases described in this post on the future of BI and data consumption.

Chat use case:

We suggest using Claude Desktop for this and creating a custom project that includes a prompt explaining the use cases you are looking to cover.

To get this working:

  • Follow the instructions in the Readme to install the MCP server
  • Validate that you have added the MCP config to your Claude desktop config. You should see ‘dbt’ when you go to Claude→Settings→Developer
Claude Desktop – MCP server running in Developer settingsClaude Desktop – MCP server running in Developer settings
  • Create a new project called “analytics”. Give it a description of how an end user might interact with it.
Example Claude Desktop project connected to the dbt MCP serverExample Claude Desktop project connected to the dbt MCP server
  • Add a custom prompt explaining that questions in this project will likely be routed through the dbt MCP server. You’ll likely want to customize this to your particular organizational context.
    • For example: This conversation is connected to and knows about the information in your dbt Project via the dbt MCP server. When you receive a question that plausibly needs data from an external data source, you will likely want to use the tools available via the dbt MCP server to provide it.

Deployment considerations:

  • This is an experimental release. We recommend that initial use should be focused on prototyping and proving value before rolling out widely across your organization.
  • Be particularly mindful with the project execution tools - remember that LLMs make mistakes and begin with permissions scoped so that you can experiment but not disrupt your data operations.
  • Start with the smallest possible use case that provides tangible value. Instead of giving this access to your entire production dbt Project, consider creating an upstream project that inherits a smaller subset of models and metrics that will power the workflow.
  • As of right now we don’t have perfect adherence for tool selection. In our testing, the model will sometimes cycle through several unnecessary tool calls or call them in the wrong order. While this can usually be fixed by more specific prompting by the end user, that goes against the spirit of allowing the model to dynamically select the right tool for the job. We expect this to be addressed over time via improvements in the dbt MCP Server, as well as client interfaces and the protocol itself.
  • Think carefully about the use cases for Semantic Layer tool vs. using the SQL execution tool. SQL execution is powerful but less controllable. We’re looking to do a lot of hands on testing to begin to develop heuristics about when SQL execution is the best option, when to bake logic into the Semantic Layer and whether there are new abstractions that might be needed for AI workflows.
  • Tool use is powerful because it can link multiple tools together. What tools complement the dbt MCP Server? How can we use this to tie our structured data into other workflows?

The future of the dbt MCP and the correct layers of abstraction for interfacing with your data

We are in the very early days of MCP as a protocol and determining how best to connect your structured data to LLM systems. This is an extremely exciting, dynamic time where we are working out, in real time, how to best serve this data and context.

We have high confidence that the approach of serving context to your AI systems via dbt will prove a durable piece of this stack. As we work with the Community on implementing this in real world use cases, it is quite likely that the details of the implementation and how you access it may change. Here are some of the areas we expect this to evolve.

Determining the best source of context for the dbt MCP You’ll notice that these tools have two broad information inputs - dbt Cloud APIs and the dbt CLI. We expect to continue to build on both of these, specifically with dbt Cloud APIs to serve the abstraction of choice when it is desirable to operate off of a specific environment.

There will be other use cases, specifically for dbt development, when you’ll want to operate based off of your current working context, we’ll be releasing tooling for that in the near future (and welcome Community submitted ideas and contributions). We’re looking forward to trying out alternative methods here and looking forward to hearing from the Community how you would like to have this context loaded in. Please feel free to experiment and share your findings with us.

Determining the most useful tools for the dbt MCP

What are the best and most useful set of tools to enable human in the loop and AI driven LLM access to structured data? The dbt MCP server presents our early explorations, but we anticipate that the Community will find many more.

How to handle hosting, authentication, RBAC and more

Currently the dbt MCP server is locally hosted, with access management via scoped service tokens from dbt Cloud or locally configured via your CLI. We expect there to be three levels via which we will continue to build out systems to make this not only safe and secure, but tailored to the needs of the specific user (human or agent) accessing the MCP.

  1. Hosting of the MCP: In the near future we will have a Cloud hosted version of the MCP alongside the current local MCP
  2. Managing data access with the MCP: We are committed to offering safe and trustworthy data and data asset access (think OAuth support and more)
  3. User and domain level context: Over the longer run we are looking into ways to provide user and domain specific knowledge about your data assets to the systems as they are querying it.

Expect to hear more on this front on 5/28.

This is a new frontier for the whole Community. We need to be having open, honest discussions about how to integrate these systems into our existing workflows and open up new use cases.

To join the conversation, head over to #tools-dbt-mcp in the dbt Community Slack.

Establishing dbt Cloud: Securing your account through SSO & RBAC

· 9 min read
Brian Jan
Lead Cloud Onboarding Architect

As a dbt Cloud admin, you’ve just upgraded to dbt Cloud on the Enterprise plan - congrats! dbt Cloud has a lot to offer such as CI/CD, Orchestration, dbt Explorer, dbt Semantic Layer, dbt Mesh, Visual Editor, dbt Copilot, and so much more. But where should you begin?

We strongly recommend as you start adopting dbt Cloud functionality to make it a priority to set up Single-Sign On (SSO) and Role-Based Access Control (RBAC). This foundational step enables your organization to keep your data pipelines secure, onboard users into dbt Cloud with ease, and optimize cost savings for the long term.

Getting Started with git Branching Strategies and dbt

· 31 min read
Christine Berger
Resident Architect at dbt Labs
Carol Ohms
Resident Architect at dbt Labs
Taylor Dunlap
Senior Solutions Architect at dbt Labs
Steve Dowling
Senior Solutions Architect at dbt Labs

Hi! We’re Christine and Carol, Resident Architects at dbt Labs. Our day-to-day work is all about helping teams reach their technical and business-driven goals. Collaborating with a broad spectrum of customers ranging from scrappy startups to massive enterprises, we’ve gained valuable experience guiding teams to implement architecture which addresses their major pain points.

The information we’re about to share isn't just from our experiences - we frequently collaborate with other experts like Taylor Dunlap and Steve Dowling who have greatly contributed to the amalgamation of this guidance. Their work lies in being the critical bridge for teams between implementation and business outcomes, ultimately leading teams to align on a comprehensive technical vision through identification of problems and solutions.

Why are we here?
We help teams with dbt architecture, which encompasses the tools, processes and configurations used to start developing and deploying with dbt. There’s a lot of decision making that happens behind the scenes to standardize on these pieces - much of which is informed by understanding what we want the development workflow to look like. The focus on having the perfect workflow often gets teams stuck in heaps of planning and endless conversations, which slows down or even stops momentum on development. If you feel this, we’re hoping our guidance will give you a great sense of comfort in taking steps to unblock development - even when you don’t have everything figured out yet!

Parser, Better, Faster, Stronger: A peek at the new dbt engine

· 5 min read
Joel Labes
Senior Developer Experience Advocate at dbt Labs

Remember how dbt felt when you had a small project? You pressed enter and stuff just happened immediately? We're bringing that back.

Benchmarking tip: always try to get data that's good enough that you don't need to do statistics on itBenchmarking tip: always try to get data that's good enough that you don't need to do statistics on it

After a series of deep dives into the guts of SQL comprehension, let's talk about speed a little bit. Specifically, I want to talk about one of the most annoying slowdowns as your project grows: project parsing.

When you're waiting a few seconds or a few minutes for things to start happening after you invoke dbt, it's because parsing isn't finished yet. But Lukas' SDF demo at last month's webinar didn't have a big wait, so why not?

The key technologies behind SQL Comprehension

· 16 min read
Dave Connors
Staff Developer Experience Advocate at dbt Labs

You ever wonder what’s really going on in your database when you fire off a (perfect, efficient, full-of-insight) SQL query to your database?

OK, probably not 😅. Your personal tastes aside, we’ve been talking a lot about SQL Comprehension tools at dbt Labs in the wake of our acquisition of SDF Labs, and think that the community would benefit if we included them in the conversation too! We recently published a blog that talked about the different levels of SQL Comprehension tools. If you read that, you may have encountered a few new terms you weren’t super familiar with.

In this post, we’ll talk about the technologies that underpin SQL Comprehension tools in more detail. Hopefully, you come away with a deeper understanding of and appreciation for the hard work that your computer does to turn your SQL queries into actionable business insights!

The Three Levels of SQL Comprehension: What they are and why you need to know about them

· 9 min read
Joel Labes
Senior Developer Experience Advocate at dbt Labs

Ever since dbt Labs acquired SDF Labs last week, I've been head-down diving into their technology and making sense of it all. The main thing I knew going in was "SDF understands SQL". It's a nice pithy quote, but the specifics are fascinating.

For the next era of Analytics Engineering to be as transformative as the last, dbt needs to move beyond being a string preprocessor and into fully comprehending SQL. For the first time, SDF provides the technology necessary to make this possible. Today we're going to dig into what SQL comprehension actually means, since it's so critical to what comes next.

Why I wish I had a control plane for my renovation

· 4 min read
Mark Wan
Senior Solutions Architect at dbt Labs

When my wife and I renovated our home, we chose to take on the role of owner-builder. It was a bold (and mostly naive) decision, but we wanted control over every aspect of the project. What we didn’t realize was just how complex and exhausting managing so many moving parts would be.

My wife pondering our sanityMy wife pondering our sanity

We had to coordinate multiple elements:

  • The architects, who designed the layout, interior, and exterior.
  • The architectural plans, which outlined what the house should look like.
  • The builders, who executed those plans.
  • The inspectors, councils, and energy raters, who checked whether everything met the required standards.

Test smarter not harder: Where should tests go in your pipeline?

· 8 min read
Faith McKenna
Senior Technical Instructor at dbt Labs
Jerrie Kumalah Kenney
Resident Architect at dbt Labs

👋 Greetings, dbt’ers! It’s Faith & Jerrie, back again to offer tactical advice on where to put tests in your pipeline.

In our first post on refining testing best practices, we developed a prioritized list of data quality concerns. We also documented first steps for debugging each concern. This post will guide you on where specific tests should go in your data pipeline.

Note that we are constructing this guidance based on how we structure data at dbt Labs. You may use a different modeling approach—that’s okay! Translate our guidance to your data’s shape, and let us know in the comments section what modifications you made.

First, here’s our opinions on where specific tests should go:

  • Source tests should be fixable data quality concerns. See the callout box below for what we mean by “fixable”.
  • Staging tests should be business-focused anomalies specific to individual tables, such as accepted ranges or ensuring sequential values. In addition to these tests, your staging layer should clean up any nulls, duplicates, or outliers that you can’t fix in your source system. You generally don’t need to test your cleanup efforts.
  • Intermediate and marts layer tests should be business-focused anomalies resulting specifically from joins or calculations. You also may consider adding additional primary key and not null tests on columns where it’s especially important to protect the grain.

Test smarter not harder: add the right tests to your dbt project

· 11 min read
Faith McKenna
Senior Technical Instructor at dbt Labs
Jerrie Kumalah Kenney
Resident Architect at dbt Labs

The Analytics Development Lifecycle (ADLC) is a workflow for improving data maturity and velocity. Testing is a key phase here. Many dbt developers tend to focus on primary keys and source freshness. We think there is a more holistic and in-depth path to tread. Testing is a key piece of the ADLC, and it should drive data quality.

In this blog, we’ll walk through a plan to define data quality. This will look like:

  • identifying data hygiene issues
  • identifying business-focused anomaly issues
  • identifying stats-focused anomaly issues

Once we have defined data quality, we’ll move on to prioritize those concerns. We will:

  • think through each concern in terms of the breadth of impact
  • decide if each concern should be at error or warning severity

Snowflake feature store and dbt: A bridge between data pipelines and ML

· 14 min read
Randy Pettus
Senior Partner Sales Engineer at Snowflake
Luis Leon
Partner Solutions Architect at dbt Labs

Flying home into Detroit this past week working on this blog post on a plane and saw for the first time, the newly connected deck of the Gordie Howe International bridge spanning the Detroit River and connecting the U.S. and Canada. The image stuck out because, in one sense, a feature store is a bridge between the clean, consistent datasets and the machine learning models that rely upon this data. But, more interesting than the bridge itself is the massive process of coordination needed to build it. This construction effort — I think — can teach us more about processes and the need for feature stores in machine learning (ML).

Think of the manufacturing materials needed as our data and the building of the bridge as the building of our ML models. There are thousands of engineers and construction workers taking materials from all over the world, pulling only the specific pieces needed for each part of the project. However, to make this project truly work at this scale, we need the warehousing and logistics to ensure that each load of concrete rebar and steel meets the standards for quality and safety needed and is available to the right people at the right time — as even a single fault can have catastrophic consequences or cause serious delays in project success. This warehouse and the associated logistics play the role of the feature store, ensuring that data is delivered consistently where and when it is needed to train and run ML models.

Iceberg Is An Implementation Detail

· 6 min read
Amy Chen
Product Manager at dbt Labs

If you haven’t paid attention to the data industry news cycle, you might have missed the recent excitement centered around an open table format called Apache Iceberg™. It’s one of many open table formats like Delta Lake, Hudi, and Hive. These formats are changing the way data is stored and metadata accessed. They are groundbreaking in many ways.

But I have to be honest: I don’t care. But not for the reasons you think.

How Hybrid Mesh unlocks dbt collaboration at scale

· 7 min read
Jason Ganz
Developer Experience at dbt Labs

One of the most important things that dbt does is unlock the ability for teams to collaborate on creating and disseminating organizational knowledge.

In the past, this primarily looked like a team working in one dbt Project to create a set of transformed objects in their data platform.

As dbt was adopted by larger organizations and began to drive workloads at a global scale, it became clear that we needed mechanisms to allow teams to operate independently from each other, creating and sharing data models across teams — dbt Mesh.

How to build a Semantic Layer in pieces: step-by-step for busy analytics engineers

· 10 min read
Gwen Windflower
Senior Developer Experience Advocate

The dbt Semantic Layer is founded on the idea that data transformation should be both flexible, allowing for on-the-fly aggregations grouped and filtered by definable dimensions and version-controlled and tested. Like any other codebase, you should have confidence that your transformations express your organization’s business logic correctly. Historically, you had to choose between these options, but the dbt Semantic Layer brings them together. This has required new paradigms for how you express your transformations though.

Putting Your DAG on the internet

· 5 min read
Ernesto Ongaro
Senior Solutions Architect at dbt Labs
Sebastian Stan
Data Engineer at EQT Group
Filip Byrén
VP and Software Architect at EQT Group

New in dbt: allow Snowflake Python models to access the internet

With dbt 1.8, dbt released support for Snowflake’s external access integrations further enabling the use of dbt + AI to enrich your data. This allows querying of external APIs within dbt Python models, a functionality that was required for dbt Cloud customer, EQT AB. Learn about why they needed it and how they helped build the feature and get it shipped!

Up and Running with Azure Synapse on dbt Cloud

· 11 min read
Anders Swanson
Senior Developer Experience Advocate at dbt Labs

At dbt Labs, we’ve always believed in meeting analytics engineers where they are. That’s why we’re so excited to announce that today, analytics engineers within the Microsoft Ecosystem can use dbt Cloud with not only Microsoft Fabric but also Azure Synapse Analytics Dedicated SQL Pools (ASADSP).

Since the early days of dbt, folks have been interested having MSFT data platforms. Huge shoutout to Mikael Ene and Jacob Mastel for their efforts back in 2019 on the original SQL Server adapters (dbt-sqlserver and dbt-mssql, respectively)

The journey for the Azure Synapse dbt adapter, dbt-synapse, is closely tied to my journey with dbt. I was the one who forked dbt-sqlserver into dbt-synapse in April of 2020. I had first learned of dbt only a month earlier and knew immediately that my team needed the tool. With a great deal of assistance from Jeremy and experts at Microsoft, my team and I got it off the ground and started using it. When I left my team at Avanade in early 2022 to join dbt Labs, I joked that I wasn’t actually leaving the team; I was just temporarily embedding at dbt Labs to expedite dbt Labs getting into Cloud. Two years later, I can tell my team that the mission has been accomplished! Kudos to all the folks who have contributed to the TSQL adapters either directly in GitHub or in the community Slack channels. The integration would not exist if not for you!

Unit testing in dbt for test-driven development

· 9 min read
Doug Beatty
Senior Developer Experience Advocate at dbt Labs

Do you ever have "bad data" dreams? Or am I the only one that has recurring nightmares? 😱

Here's the one I had last night:

It began with a midnight bug hunt. A menacing insect creature has locked my colleagues in a dungeon, and they are pleading for my help to escape . Finding the key is elusive and always seems just beyond my grasp. The stress is palpable, a physical weight on my chest, as I raced against time to unlock them.

Of course I wake up without actually having saved them, but I am relieved nonetheless. And I've had similar nightmares involving a heroic code refactor or the launch of a new model or feature.

Good news: beginning in dbt v1.8, we're introducing a first-class unit testing framework that can handle each of the scenarios from my data nightmares.

Before we dive into the details, let's take a quick look at how we got here.

Conversational Analytics: A Natural Language Interface to your Snowflake Data

· 12 min read
Doug Guthrie
Senior Solutions Architect at dbt Labs

Introduction

As a solutions architect at dbt Labs, my role is to help our customers and prospects understand how to best utilize the dbt Cloud platform to solve their unique data challenges. That uniqueness presents itself in different ways - organizational maturity, data stack, team size and composition, technical capability, use case, or some combination of those. With all those differences though, there has been one common thread throughout most of my engagements: Generative AI and Large Language Models (LLMs). Data teams are either 1) proactively thinking about applications for it in the context of their work or 2) being pushed to think about it by their stakeholders. It has become the elephant in every single (zoom) room I find myself in.

How we're making sure you can confidently switch to the "Latest" release track in dbt Cloud

· 10 min read
Michelle Ark
Staff Software Engineer at dbt Labs
Chenyu Li
Staff Software Engineer at dbt Labs
Colin Rogers
Senior Software Engineer at dbt Labs
Versionless is now the "latest" release track

This blog post was updated on December 04, 2024 to rename "versionless" to the "latest" release track allowing for the introduction of less-frequent release tracks. Learn more about Release Tracks and how to use them.

As long as dbt Cloud has existed, it has required users to select a version of dbt Core to use under the hood in their jobs and environments. This made sense in the earliest days, when dbt Core minor versions often included breaking changes. It provided a clear way for everyone to know which version of the underlying runtime they were getting.

However, this came at a cost. While bumping a project's dbt version appeared as simple as selecting from a dropdown, there was real effort required to test the compatibility of the new version against existing projects, package dependencies, and adapters. On the other hand, putting this off meant foregoing access to new features and bug fixes in dbt.

But no more. Today, we're ready to announce the general availability of a new option in dbt Cloud: the "Latest" release track.

Maximum override: Configuring unique connections in dbt Cloud

· 6 min read
Gwen Windflower
Senior Developer Experience Advocate

dbt Cloud now includes a suite of new features that enable configuring precise and unique connections to data platforms at the environment and user level. These enable more sophisticated setups, like connecting a project to multiple warehouse accounts, first-class support for staging environments, and user-level overrides for specific dbt versions. This gives dbt Cloud developers the features they need to tackle more complex tasks, like Write-Audit-Publish (WAP) workflows and safely testing dbt version upgrades. While you still configure a default connection at the project level and per-developer, you now have tools to get more advanced in a secure way. Soon, dbt Cloud will take this even further allowing multiple connections to be set globally and reused with global connections.

LLM-powered Analytics Engineering: How we're using AI inside of our dbt project, today, with no new tools.

· 10 min read
Joel Labes
Senior Developer Experience Advocate at dbt Labs

Cloud Data Platforms make new things possible; dbt helps you put them into production

The original paradigm shift that enabled dbt to exist and be useful was databases going to the cloud.

All of a sudden it was possible for more people to do better data work as huge blockers became huge opportunities:

  • We could now dynamically scale compute on-demand, without upgrading to a larger on-prem database.
  • We could now store and query enormous datasets like clickstream data, without pre-aggregating and transforming it.

Today, the next wave of innovation is happening in AI and LLMs, and it's coming to the cloud data platforms dbt practitioners are already using every day. For one example, Snowflake have just released their Cortex functions to access LLM-powered tools tuned for running common tasks against your existing datasets. In doing so, there are a new set of opportunities available to us: