Skip to content

GCP Streaming runner install🔗

The Streaming runner is a component within Maia that serves as a bridge between the source database and the target cloud data lake or cloud storage, enabling the execution and scheduling of streaming pipelines. The Streaming runner will be hosted in your own infrastructure, using a Hybrid SaaS solution.

Once the Streaming runner is configured and started, it operates autonomously without requiring much intervention. The Streaming runner continuously monitors all changes occurring in the source database, consumes those changes from the low-level logs, and delivers them to the designated target data lake or storage. This ensures a continuous and reliable change data capture process.

This topic explains how to create a Streaming runner in your GCP infrastructure.

Note

Each Streaming runner can run only one Streaming pipeline. Each Streaming pipeline requires a new Streaming runner installation.


Prerequisites🔗

  • A Maia account. To register, read Registration. Once you have signed up, log in to the Maia.
  • An account in GCP to host the Streaming runner.
  • Access to a cloud secrets service, which is a secure storage system for storing authentication and connection secrets. These secrets are used by the Streaming runner to authenticate itself with the source database and establish a secure connection for capturing the data changes.

Create a Streaming runner🔗

  1. Click Agents & Instances → Agents. The Agents page lists all Streaming runners currently created, showing their Status, Platform, and Type.
  2. Click Add agent.
  3. Click Streaming.
  4. Complete the following properties:

    • Agent name: A unique name for your new Streaming runner. Maximum 30 characters. Accepts both uppercase and lowercase A-z, 0-9, whitespace (not the first character), hyphens and underscores.
    • Description: Optionally enter a brief description of the Streaming runner.
    • Cloud provider: The cloud platform that the Streaming runner will be deployed to. Select GCP.
    • Deployment: The Streaming runner deployment method. Choose from GCE (Google Compute Engine) or GKE (Google Kubernetes Engine).
  5. Click Create agent.

  6. This creates a Streaming runner definition in Maia, and displays the following parameters on the Agent details page. You will need these parameters to set up the Streaming runner in your GCP infrastructure.

    • ACCOUNT_ID
    • AGENT_ID
    • MATILLION_REGION
    • OAUTH_CLIENT_ID
    • OAUTH_CLIENT_SECRET

    The screen will also show any optional environment variables needed by the Streaming runner.

    Click Reveal to make the OAUTH_CLIENT_ID and OAUTH_CLIENT_SECRET values visible before copying.

  7. The Streaming runner's status is set to Pending, which means it is not yet ready to run pipelines. The next step is to deploy the Streaming runner application into your GCP infrastructure, as described below.


Set up the Streaming runner in your GCP infrastructure🔗

Setting up Streaming runners requires access to the GCP platform and services, as well as a degree of familiarity with that platform. We recommend going through this process with your GCP platform administrator.

Note

GCP resources may come with their own pricing independent of any billing you receive from Matillion.

Recommendations:

  • Create new resources specifically for streaming use rather than attempt to use existing cloud resources.
  • Set up a resource group for your new resources for better organization and billing ease.
  • Consult your cloud/network administrator for advice on GCP permissions, roles, access and other considerations such as GCP regions.
  • Keep resources in the same Google Cloud region. Note that all resources and services may not be available in all regions; it is recommended you research your desired region in advance.

Check Streaming runner status🔗

After deploying the Streaming runner in your GCP infrastructure, you should return to Maia to verify that it's correctly connected and running.

  1. Click Agents & Instances → Agents. The Agents screen lists all Streaming runners currently created.
  2. Locate the Streaming runner in the list and check the status:

    • Pending: The Streaming runner has been created but has not yet connected to Maia.
    • Running: The Streaming runner is connected and available for running Streaming pipelines, or is connected and already running a Streaming pipeline.
    • Stopped: The Streaming runner has been stopped.
    • Unknown: The Streaming runner is in an unknown state. This typically means the Streaming runner has lost connection to Maia without being stopped, for example due to networking issues.
  3. When the Streaming runner status shows Running, it's ready to use. It can be selected in the Agent drop-down when you create a new Streaming pipeline, as long as a pipeline is not already assigned.