Scaling best practices🔗

Agent Type:

Maia Hybrid

Agent Platform:

AWS ✅ Azure ✅ Snowflake ✅

Maia executes pipelines via a Maia Foundation runner. This works by decomposing your pipelines into tasks, which are then distributed across the instances (nodes) of a Maia Foundation runner.

When hosting Maia Foundation runners within your VPC or VNet (also known as a Hybrid SaaS solution) it's necessary to right-size your Maia Foundation runners to ensure you can obtain the level of performance and concurrency you need.

This guide provides details of the key considerations.

Note

If you're using our Full SaaS offering, we advise you contact us to discuss scaling your Maia Foundation runner.

Tasks🔗

Tasks are the smallest unit of work a Maia Foundation runner can execute and consist of items such as:

A single orchestration component execution.
A specific execution of a transformation pipeline.

Note

Using Designer also generates tasks, for actions such as running a sample operation or loading a list of tables or columns. However, no limiting of these design-time tasks is undertaken.

Maia Foundation runner instances🔗

In Maia, work is executed via Maia Foundation runners, and each Maia Foundation runner is made up of Maia Foundation runner instances. In practice, this is implemented using containers. A Maia Foundation runner is a named collection of containers, with each container being known as a Maia Foundation runner instance.

When pipeline tasks are sent to a Maia Foundation runner, they will be sent to any Maia Foundation runner instance that has capacity. If there is no current capacity, then the pipeline task will be queued. When an instance subsequently has capacity, the pipeline task will be sent to that instance and be executed.

Note

Maia Foundation runners should be configured with a minimum of 2 Maia Foundation runner instances to ensure the automatic upgrade process does not cause a service outage—since Maia Foundation runner instances will be upgraded in a staggered fashion.

Maia Foundation runner instance capacity🔗

To protect the stability of the Maia Foundation runner instances under load, a Maia Foundation runner instance won't take on a new task if:

The CPU usage exceeds 80%.
RAM usage reaches the default maximum heap size (60% of system RAM).
The Maia Foundation runner instance is already running 20 concurrent tasks.

Tasks that can't execute because there is no available Maia Foundation runner instance will queue until a Maia Foundation runner instance becomes available.

Note

If you consistently see tasks queuing or Maia Foundation runner instances frequently reaching these thresholds, consider scaling your deployment by adding more Maia Foundation runner instances.

Scaling for load🔗

Horizontal scaling🔗

Each Maia Foundation runner instance is limited to 20 concurrent tasks at any one time. This is regardless of the amount of resources assigned to the Maia Foundation runner instance. As such, a high level of concurrency in your pipelines would result in tasks being queued and would result in the overall pipeline execution taking longer.

Horizontal scaling involves adding more Maia Foundation runner instances. By adding more instances, you increase the number of tasks that can run in parallel—two Maia Foundation runner instances allow 40 concurrent tasks to be executed, and so on—reducing task queuing.

The method for adding Maia Foundation runner instances will vary depending on your container orchestrator—see here for detailed instructions:

Cost implications🔗

Adding more Maia Foundation runner instances does not result in extra charges from Matillion. Our credit charges are based on task execution time. Task queuing time does not consume credits.

However, running extra containers (e.g. Maia Foundation runner instances) is likely to increase the infrastructure cost from your container orchestrator (e.g. AWS Fargate).

As such, ensuring enough Maia Foundation runner instances are available for the required performance requires balance between desired performance and infrastructure cost/budget.

Transformation tasks - low load🔗

Since transformation tasks generate SQL that is then executed by your cloud data warehouse, these tasks do not require a large amount of CPU time or memory on the Maia Foundation runner instances. With this in mind, if your workload is "transformation heavy", a smaller Maia Foundation runner (with a low number of Maia Foundation runner instances) will likely suffice.

Data ingestion and scripting - high load🔗

Components that move or ingest data—as well as those allowing the execution of customer scripts such as Python or Bash—place a high CPU and memory burden on Maia Foundation runner instances. If workloads involve a high volume data ingestion or custom scripting, you'll need to run a larger number of Maia Foundation runner instances.

Further considerations🔗

Scale up delay🔗

Once you have edited the Maia Foundation runner service to start more Maia Foundation runner instances, there is a delay of approximately 4 minutes for the Maia Foundation runner instances to start and dial back to Maia to begin accepting tasks.

Scaling AWS Maia Foundation runners from within a pipeline🔗

If using AWS ECS Fargate to run your Maia Foundation runners, you can scale up and down within a pipeline. The AWS command line tools are available within the Bash Script component and this can be used to edit the ECS service to change the desired number of instances.

Bash scripts executed in Hybrid SaaS environments obtain the IAM permissions assigned to the Maia Foundation runner. If these permissions include amending an ECS Fargate service, then a script can be used to change the number of Maia Foundation runner instances.

The below script can be used within a Bash Script orchestration component to do this:

###
# This script will alter the desired task count for a Matillion Agent
# Please set the required variables to the values seen in the AWS ECS Console
# Note: new agent instances usually take sround 4 minutes to be available for task processing
###

AWS_REGION=<AWS region e.g. eu-west-1>
AWS_ECS_SERVICE=<ECS Fargate Task Name>
AWS_ECS_CLUSTER=<ECS Cluster Name>
DESIRED_AGENT_COUNT=2

aws ecs update-service --service $AWS_ECS_SERVICE --desired-count $DESIRED_AGENT_COUNT --region $AWS_REGION --cluster $AWS_ECS_CLUSTER

This script needs the following permission in the IAM role attached to the Task Definition as the Task Role for the Maia Foundation runner:

ecs:UpdateService

You must add this permission before running the script, as it's not added by default.

Note

New Maia Foundation runner instances take around 4 minutes to become available. Please consider this when scheduling your scaling events.
Be aware of other pipelines or users who may be relying on the Maia Foundation runner—this resize will affect any users or pipelines using the Maia Foundation runner.

Scaling Snowflake🔗

If you're not seeing the performance expected—even when using a Matillion Full SaaS solution of sufficient size for your workloads—it might be that your Snowflake warehouse defined in the Matillion environment needs scaling. Read Monitoring Warehouse Load to learn more.

If a warehouse is overloaded with many parallel queries, the queries will queue. A large queue time shown in the Snowflake graph indicates pipeline performance will benefit from scaling the warehouse.

Matillion recommends enabling multi-cluster warehousing if this is available in your Snowflake account. Using this mechanism, Snowflake will horizontally scale the warehouse by starting and stopping instances of the warehouse automatically. Matillion has found that this improves concurrent performance in a better way than simply increasing the size of the warehouse.

Scaling Maia Foundation runner for Snowflake🔗

You can scale the number of Maia Foundation runner instances within Snowflake, which allows you to increase or decrease concurrency to better handle your pipeline workload.

Warning

Scaling down the number of Maia Foundation runner instances while pipelines are running and the Maia Foundation runner is not in a Paused state can lead to pipeline failures.

From your Snowflake Home screen, click Data Products → Apps. You must be using the role that originally installed the application.
Locate Matillion Maia in the list of apps, and click to select it. If you have multiple installs of the Native App, select the one you wish to scale.
Click the Control Panel tab.
Under Agent scaling enter the number of Maia Foundation runner instances you want to run. The maximum number of Maia Foundation runner instances is 10.
Click Apply, then Apply again to confirm.

This will put the compute pool into a Resizing state and change the Maia Foundation runner status to Pending. The Maia Foundation runner status will change to Running after a few minutes.