Upgrade considerationsπ
There are two different paths to upgrading to Maia from Matillion ETL:
- Self Serve, which is detailed in these pages.
- Assisted, where you are supported by Matillion or a Matillion partner. Contact your Matillion Account Manager to discuss support for your upgrade project.
There are some prerequisite steps you need to take to set up Maia before upgrading, and some decisions you need to make about how you will use Maia, which are discussed below.
What do you need to consider before upgrading?π
- Maia supports Snowflake, Databricks, and Amazon Redshift.
- Snowflake on GCP is only available in Maia in a Full SaaS environment at this time.
- Google Big Query isn't supported in Maia at this time.
- Azure Synapse isn't supported in Maia at this time.
As a Matillion ETL user, you'll find that Maia has a very familiar look and feel, but there are some small terminology differences you'll need to bear in mind when reading the documentation:
| Matillion ETL | Maia |
|---|---|
| Job | Pipeline |
| Job variable | Pipeline variable |
| Environment variable | Project variable |
Maia and Matillion ETL have different architectures, and we recommend reading the following to gain an understanding of how Maia operates:
- Maia architecture.
- The SaaS delivery model.
- The billing model.
- Maia security and the security information in the Matillion Trust Center.
You can start using Maia in parallel with Matillion ETL, allowing you to test and evaluate features, and even to build and run new production pipelines, without migrating any workloads until you're ready. You also don't have to migrate every workload as a single operationβyou can migrate a single workload, and get that into production in Maia before moving onto the next.
Maia projectsπ
Maia uses projects to logically group and separate workloads. You will be familiar with the concept of projects from Matillion ETL, and you may wish to create a set of projects and project folders that mirror your Matillion ETL project structure.
Different projects can have different users, permissions, and credentials associated with them in Maia, so you may want to take some time to plan this. Read Projects to understand what your options are when creating a Maia project, and consider the following choices.
Full SaaS or Hybrid SaaS?π
Read Matillion Full SaaS vs Hybrid SaaS for an explanation of the differences. For a Hybrid SaaS project, you install our Maia Foundation runner software within your own cloud infrastructure and data plane. Read Create a Maia Foundation runner in your infrastructure for details of how to set up the Maia Foundation runner.
A Full SaaS project is the easiest way for a new customer to get started with learning Maia and to get initial pipelines running, but a Hybrid SaaS project is recommended if:
- You are migrating workloads that need the functionality of a Hybrid SaaS configuration. For example:
- Python scripting operates differently under Full SaaS versus Hybrid SaaS.
- Hybrid SaaS gives you the flexibility to upload your own libraries and drivers.
- Your data locality requirements need the Maia Foundation runner to run in a specific region that Matillion doesn't currently provide Maia Foundation runners in.
- Your use case requires proximity between the data processing Maia Foundation runner and your source systems.
- You want Maia Maia Foundation runner to run in the same network location as Matillion ETL.
This decision is made on a per-project basis, and you can run both types of project simultaneously if required.
Git version controlπ
Maia uses Git version control to keep track of changes made to projects and facilitate collaboration. Read Git in Designer to learn more.
You can choose to connect your own Git repository to Maia (BYOG), or use the Matillion-hosted Git repository that's provided as part of Maia. The Matillion Git repository is a convenient option if you don't already use a Git provider, but it has some limitations compared to using your own Git repository. In most scenarios, we recommend you connect your own Git repository, as it gives you much greater control and access to functionality within Git. This assumes you are already familiar with the use of Git repositories, and manage your own repository that you can connect.
Warning
Reusing an existing Matillion ETL Git repository isn't supported and can cause issues with your pipelines.
To use BYOG in Maia, you must have your repository set up in advance using one of the supported third-party Git providers, and have the appropriate connection details and credentials available when you create the Maia project.
The following comparison tables will help you determine which Git option is best for your organization.
| Use case | Choose |
|---|---|
| Trials and proof of value | Matillion hosted |
| One- or two-person data teams | Matillion hosted |
| Relatively simple use cases | Matillion hosted |
| Large or cross-functional data teams | BYOG |
| Complex use cases and workflows | BYOG |
| Greater control over repositories | BYOG |
| Feature | Matillion-hosted Git | BYOG |
|---|---|---|
| Branching | β | β |
| Commit, Push, Merge, Pull | β | β |
| Basic merge conflict resolution | β | β |
| Basic Git Reset | β | β |
| Basic Git Revert | β | β |
| Compare changes (Git Diff) | β | β |
| View commit history | β | β |
| Pull requests | β | β |
| Link out to Git provider | β | β |
| Branch protection rules | β | β |
| Advanced merge conflict resolution | β | β |
| Advanced Git Revert | β | β |
| Git tagging | β | β |
| Access to CI/CD tooling | β | β |
Environmentsπ
A Maia environment defines the connection between a project and your chosen cloud data platform. Environments include default configuration, such as a default warehouse, database, and schema, that can be used to pre-populate component properties in your pipelines. Ensure that you have your environment connection details and credentials available when you create your first project. Read Environments for details.
Credentialsπ
Secrets, passwords, or OAuths that your jobs use to connect to third-party services must be recreated directly in Maia; for security reasons, we don't migrate these details. Ensure that you are aware which credentials your workloads need, and have the details available to create those credentials.
Read Secrets and secret definitions, Cloud provider credentials, and OAuth for details.
Branchesπ
Maia is designed around the concept of Git branches to provide version control and collaboration features. This is similar to the optional Git integration feature in Matillion ETL, but in Maia it's not optional and all pipelines must be assigned to Git branches.
Regardless of whether you currently use Git in Matillion ETL or not, ensure you have read and understood Git in Designer.
Plan how you will use branches to contain your migrated pipelines. Decide whether you want a simple development/production branch structure, whether you want separate branches for different development teams, and so on. Your project will have a main branch created by default, but good practice is to perform all development work in dedicated development branches, and only merge the work into main when ready for production.
Task concurrencyπ
Consideration needs to be given to managing concurrency post-upgrade, as the Maia architecture can support more concurrent pipeline runs than Matillion ETL supports concurrent jobs. In most cases, this will result in performance improvement without any issues, but there are some edge-case scenarios, outlined below, that you should be aware of and take steps to mitigate if necessary.
In Matillion ETL, the number of concurrent tasks that can run is determined by instance size. In Maia, the number of concurrent tasks is determined by the number of Maia Foundation runner instances you have running, and can therefore scale much higher than concurrency in Matillion ETL.
Maia pipeline execution is tied to one Maia Foundation runner, but can use all the Maia Foundation runner instances within that Maia Foundation runner. For example, with one Maia Foundation runner scaled to eight Maia Foundation runner instances, you can have up to 160 concurrent task executions for a single pipeline execution. With eight separate Maia Foundation runners each scaled to a single instance, you can have a maximum pipeline concurrency of 20.
This greater concurrency may cause issues where:
- The greater throughput from concurrent tasks in Maia may cause you to hit limits in your cloud data warehouse, where requests could queue and time out before being processed. To mitigate this risk, you can refactor pipelines to reduce the load on your cloud data warehouse at any one time, or you may be able to configure your cloud data warehouse to handle the request queue more efficiently.
- Two processes in Matillion ETL may have always run sequentially because the limit on concurrent processes prevented them from running at the same time, even though there was no explicit link between them. In Maia, these processes may run concurrently or in a different order, which could cause issues if there are dependencies between them. You should review your pipelines to identify any cases where this could be an issue, and add links between processes where necessary to ensure they run in the correct order.