OVHcloud data processing

OVHcloud Data Processing

Process your data easily and at scale, powered by Apache Spark

You need Apache Spark computation at scale but you don't have computers ?
You don't have enough time to create a cluster of computers and do all installations and configurations ?
You just need a cluster for few minutes and not forever ?
Or you just want to try out easily the power of Apache Spark ?

Try out our new solution OVHcloud Data Processing,

 

Now available! Try it for free in Public Cloud control panel

 

How it works

Apache Spark is a framework allowing your to perform data processing at large-scale, with speed and ease of use.

OVHcloud Data processing simplifies your tasks. You come with your Apache Spark code (Java or Python) and data, we take care of everything : setting up a cluster, processing your data, deleting used resources.

 

You can find few examples and use cases on Apache Spark documentation : https://spark.apache.org/

OVHcloud Data Processing workflow

 

 

 

Getting Started

You can find the full OVHcloud Data Processing documentation here : https://docs.ovh.com/gb/en/data-processing/

 

Step 1 : Connect to OVHcloud control panel

Connect to OVHcloud control panel, and find Data Processing in the Public Cloud left menu.

Validate the legal terms.

Note : currently only available on EUROPE region (soon in CANADA region)

 

control panel

 

 

 

Step 2: Upload your Python application code and requirements file

Before running your job in the Data Processing platform, you will need to create a container in your Object Storage for your job and upload your application Python files and environment.yml file into this container.

Please see Creating Storage Containers in Customer Panel or Create an object container in Horizon for more details.

If you don’t currently have an application code and you still would like to try OVHcloud Data Processing, you can download and use the PI sample from Apache Spark repository.

If your application has some package requirements or needs a specific version of Python to run on the cluster, make sure that you mention them in your Conda environment.yml file and upload it into the same container as your Python files in your Object Storage.

 

Step 3: Submit your Apache Spark job

You can submit a job via Control panel, API and CLI (spark-surbmit).

Here we detail how to do it with control panel.

Now, back in the Data Processing section, click on "Submit a job"

Fill the “Submit a job” form that is now displayed and at the end push the Submit job button to submit your Apache Spark job.

Please see How to fill job submit form in Data Processing Manager for more details.

control panel

 

Step 3: Check information, status and logs of a job

In the Data Processing section of the OVHcloud Manager you can see the list of all the jobs that you have submitted so far. If you click on a job’s name, you can see detailed information on it, including its status. Then you can click on the Logs to see the live logs while the job is running.

Once the job will be finished, the complete logs will be saved to your Object Storage container. You will be able to download it from your account whenever you would like.

Please see How to check your job’s logs in the Data Processing manager page for more details.

 

Step 4: Done ! Check your job’s results

After your Spark job is finished, you will be able to check the results from your logs as well as in any connected storage your job was designed to update.

 

Pricing and limitations

You can find exact capabilities and limitation in official documentation :

https://docs.ovh.com/gb/en/data-processing/capabilities/

 

Data Processing - Lab plan
Price 100% Free for lab (except if you use object storage)
Compute

Minimum 1 and maximum 30 GB of RAM per Executor

Minimum 1 and maximum 8 Cores of CPU per Executor

Minimum 1 and maximum 10 Executors per job

Maximum 24 hours for the duration of each job

Amount of Jobs No limitations
Apache Spark versions available version of Apache Spark is 2.4.3
Tools/Languages supported

Java 8

Scala 2.12

Python v2.7+ and v3.4+

 


 

Lab access : now available for everyone !

 

Simply connect to OVhcloud control panel, and use "Data Processing" in the Public Cloud section

 

Now available! Try it for free in Public Cloud control panel

 

#gitter         Talk data processing with us on Gitter.im

Trademark policies :

  • Apache Spark brand and logo are the property of Apache Foundations

Status

  • ALPHA
  • BETA
  • GAMMA