OVHcloud data processing

OVHcloud Data Processing

Process your data easily and at scale, powered by Apache Spark

You need Apache Spark computation at scale but you don't have computers ?
You don't have enough time to create a cluster of computers and do all installations and configurations ?
You just need a cluster for few minutes and not forever ?
Or you just want to try out easily the power of Apache Spark ?

Try out our new solution OVHcloud Data Processing,

 

Now available! Discover our offers

 

How it works

Apache Spark is a framework allowing your to perform data processing at large-scale, with speed and ease of use.

OVHcloud Data processing simplifies your tasks. You come with your Apache Spark code (Java or Python) and data, we take care of everything : setting up a cluster, processing your data, deleting used resources.

 

You can find few examples and use cases on Apache Spark documentation : https://spark.apache.org/

OVHcloud Data Processing workflow

 

 

 

Getting Started

You can find the full OVHcloud Data Processing documentation here : https://docs.ovh.com/gb/en/data-processing/

 

Step 1 : Connect to OVHcloud control panel

 

Connect to OVHcloud control panel, and find Data Processing in the Public Cloud left menu.

Data Processing Start Page

 

 

 

Step 2: Upload your Python application code and requirements file

Before running your job in the Data Processing platform, you will need to create a container in your Object Storage for your job and upload your application Python files and environment.yml file into this container.

Please see Creating Storage Containers in Customer Panel or Create an object container in Horizon for more details.

If you don’t currently have an application code and you still would like to try OVHcloud Data Processing, you can download and use the PI sample from Apache Spark repository.

If your application has some package requirements or needs a specific version of Python to run on the cluster, make sure that you mention them in your Conda environment.yml file and upload it into the same container as your Python files in your Object Storage.

 

Step 3: Submit your Apache Spark job

You can submit a job via Control panel, API and CLI (spark-surbmit).

Here we detail how to do it with control panel.

Now, back in the Data Processing section, click on "Submit a job"

Fill the “Submit a job” form that is now displayed and at the end push the Submit job button to submit your Apache Spark job.

Please see How to fill job submit form in Data Processing Manager for more details.

Jobs list

 

Step 3: Check information, status and logs of a job

In the Data Processing section of the OVHcloud Manager you can see the list of all the jobs that you have submitted so far. If you click on a job’s name, you can see detailed information on it, including its status. Then you can click on the Logs to see the live logs while the job is running.

Once the job will be finished, the complete logs will be saved to your Object Storage container. You will be able to download it from your account whenever you would like.

Please see How to check your job’s logs in the Data Processing manager page for more details.

 

Step 4: Done ! Check your job’s results

After your Spark job is finished, you will be able to check the results from your logs as well as in any connected storage your job was designed to update.

 

Pricing and limitations

You can find exact capabilities and limitation in official documentation :

https://docs.ovh.com/gb/en/data-processing/capabilities/

 


 

Now available for everyone !

 

Simply connect to OVhcloud control panel, and use "Data Processing" in the Public Cloud section

 

Now available! Try it in Public Cloud control panel

 

#gitter         Talk data processing with us on Gitter.im

Trademark policies :

  • Apache Spark brand and logo are the property of Apache Foundations

Status

  • ALPHA
  • BETA
  • GAMMA