OVH Data Collector

OVH Data Collector allows you to replicate, query and transport your data


OVH Data Collector

 

You can process data no matter the backend it comes from and feed any application with changes that remotely happened on data.

Data Collector is a Cloud hosted agent which can scale to optimize resources usage and can be remote controlled.

Our collector aims to support any new Source or Sink Connectors with its plugin based structure.

Try it Free

 


 

 

Performance

 

Transport your data a fast as you need to!

Data collector is only limited by network and sources speed

 

Reliability

 

Always stay up-to-date in your transfer!

Data collector is failure tolerant. If something happens, it restarts from the last collected data

Keep your data Safe!

All data transfers can be encrypted.

On premise availability (contact us)

Keep your data governance under control!

You can choose the data you want to transport, and ignore the ones you don't want.

 

 

Simplicity

 

Easy to use and administrate!

- Data Collector is deployed in OVH Cloud

- Data Collector supports full remote control

 

 


Data collector Pipeline

Technnical View

  • 300 000 Events/s in "Query" Mode
  • ~40 000 Events/s in "Change data capture" Mode
  • Containerized agents based on Mesos
  • No JVM needed, developed in Go
  • Data Collector remotely controled by API
  • Supports multiple types of sources ans sinks (see below)
  • Kafka topic provided with datacollector

Current Compatiblity Status

Source Connectors Query/import Status Change Data Capture Status
MySQL™
MariaDB™
PostgresSQL™
Microsoft SQL Server™
Oracle® Database
MongoDB™
Couchbase™
DB2™
SAP HANA™
Snowflake™
ElasticSearch™
DynamoDB™
Redshift™
Google Big Query™
File
REST API
Sink Connectors Status
Kafka™ (topic provided with datacollector)
HDFS
Apache Phoenix™
Apache Hive™
Apache HBASE™
PostgresSQL™

FAQ

After accepting terms and conditions, please fill in the form below.

Once you have ordered an agent, you should receive several information :

  • The agent ID you just ordered
  • If necessary, your tenant ID and the password associated with it

But the agent that has been created is an empty shell and still need configuration before you can use it.

The aim of this tutorial is, therefore, to help you in configuring your new data collector.

Here are the data we will use during this tutorial:

  • The tenant id will be: user-tutorial
  • The password associated with it will be: password-tutorial
  • The agent ID will be: data-collector-agent-tutorial

We will use our Swagger UI to use the API but everything can also be done using CURL commands or other systems according to your preferences.

Step-by-step guide

 

  1. First, we will have to generate a JWT token which will be signed and contains all the claims allowing you to use certain routes and not others. By default, any new tenant can use all routes for the Data Collector API but this might be restricted later if some endpoints are created for administration purposes, for instance.
     
    1. Go, with your browser, to the following URL: https://api.dataconvergence.ovh.com
       
    2. You should see something similar to this:
      howto 1
       
    3. Click on "Auth" and then on the "POST /auth/token" route. You should see something similar to this:
      how to 2
    4. In the Authorization field, enter "Basic " followed by "your username:your password" encoded in Base64.
      1. To encode in Base64, you can use the following command in linux:

        echo -n 'your username:your password' | base64
      2. For our example, "echo -n 'user-tutorial:password-tutorial' | base64" will gives us "dXNlci10dXRvcmlhbDpwYXNzd29yZC10dXRvcmlhbA==", so the Authorization field will contain the following:
    5. Now, click on "Try it out!"
       
    6. Now, in the Response Body, you should see your JWT token, generated for these credentials. Keep it somewhere to avoid having to regenerate one. As of now, any generated token is valid for one day only.
  2. This step is optional. It will allow you to check the list of all agent's ID belonging to you.
     
    1. Click on the "GET /agents" route from the "Data Collector" section. It should look similar to this:
    2. In the Authorization field, enter "Bearer " followed by the JWT token generated in step 1:
    3. Click on "Try it out!" and check the resulting IDs in the Response Body:
  3. The next step is to configure a data source for your Data Collector.
    1. We will start by listing all the currently supported sources. Expand the "GET /agents/{agentId}/sources/availableSources"route. It should look like this:
    2. Fill in the Authorization field with "Bearer " followed by the JWT token generated in step 1
    3. Fill in the agentId field with the agent you want to configure (in ou example, "data-collector-agent-tutorial") :
    4. Click on "Try it out!" and check the response body to select the source you want to use
    5. Here, we are going to choose to configure a "mysqlQuery" source. This source requires the following information, as specified in the previous response :
    6. Open the "POST /agents/{agentId}/sources/{source}" route and fill in the information (Authorization as before, same for agentId and "mysqlQuery" in "source").
      In the "body" field, enter the JSON data from the "mysqlQuery" JSON from the previous call:

      {
          "name": "tutorial-source",
          "user": "tutorial-source-user",
          "password": "tutorial-source-password",
          "host": "10.0.0.1",
          "port": 3306
      }


      It should look like this:

    7. Click on "Try it out!" and check the Response Body. It should look like this:
    8. To make sure everything is okay, open the "GET /agents/{agentId}/sources/{source}" route and fill in the information (Authorization as before, same for agentId and "tutorial-source" in "source"). Click on "Try it out!" and it should give you the resulting data:

      * Note that the password is not in the response for obvious security reasons
    9. Your source is now configured properly!
  4. The next step is to prepare the sink for your data collector. As of now, only a Kafka sink, configured by OVH, is available.
    1. To be sure that no other sinks are available, we will start by calling the "GET /agents/{agentId}/sinks/availableSinks" route, which should give you the following response:
    2. Now, just call the "POST /agents/{agentId}/sinks/{sink}" route with "ovhkafka" in the "sink" field and the "body" field with this :

      {
          "name": "tutorial-sink"
      }
    3. It should give you the following response:
    4. As for the sources, let's check the configuration of the sink (which will allow you to get data on how to read events from the kafka queue created for you). Call the "Get /agents/{agentId}/sinks/tutorial-sink" route with the right parameters, which should give you the following response:
    5. With this, you now all information necessary to connect to your Data Collector's Kafka topic.
  5. The last step is to deploy your agent!
    1. Just launch the "PUT /agents/{agentId}/deploy" and... wit for a 200 Response Body!
  6. And you are done! Your Data Collector is now configured and up and running. If your data source is accessible, you should now see your events arriving into Kafka!

Status

  • ALPHA
  • BETA
  • GAMMA

Register