OVHcloud Data Processing
Process your data easily and at scale, powered by Apache Spark
You need Apache Spark computation at scale but you don't have computers ?
You don't have enough time to create a cluster of computers and do all installations and configurations ?
You just need a cluster for few minutes and not forever ?
Or you just want to try out easily the power of Apache Spark ?
Try out our new solution OVHcloud Data Processing,
How it works
Apache Spark is a framework allowing your to perform data processing at large-scale, with speed and ease of use.
OVHcloud Data processing simplifies your tasks. You come with your Apache Spark code (Java or Python) and data, we take care of everything : setting up a cluster, processing your data, deleting used resources.
You can find few examples and use cases on Apache Spark documentation : https://spark.apache.org/
You can find the full OVHcloud Data Processing documentation here : https://docs.ovh.com/gb/en/data-processing/
Step 1 : Connect to OVHcloud control panel
Connect to OVHcloud control panel, and find Data Processing in the Public Cloud left menu.
Validate the legal terms.
Note : currently only available on EUROPE region (soon in CANADA region)
Step 2: Upload your Python application code and requirements file
Before running your job in the Data Processing platform, you will need to create a container in your Object Storage for your job and upload your application Python files and environment.yml file into this container.
If you don’t currently have an application code and you still would like to try OVHcloud Data Processing, you can download and use the PI sample from Apache Spark repository.
If your application has some package requirements or needs a specific version of Python to run on the cluster, make sure that you mention them in your Conda environment.yml file and upload it into the same container as your Python files in your Object Storage.
Step 3: Submit your Apache Spark job
You can submit a job via Control panel, API and CLI (spark-surbmit).
Here we detail how to do it with control panel.
Now, back in the Data Processing section, click on "Submit a job"
Fill the “Submit a job” form that is now displayed and at the end push the
Submit job button to submit your Apache Spark job.
Please see How to fill job submit form in Data Processing Manager for more details.
Step 3: Check information, status and logs of a job
In the Data Processing section of the OVHcloud Manager you can see the list of all the jobs that you have submitted so far. If you click on a job’s name, you can see detailed information on it, including its status. Then you can click on the
Logs to see the live logs while the job is running.
Once the job will be finished, the complete logs will be saved to your Object Storage container. You will be able to download it from your account whenever you would like.
Please see How to check your job’s logs in the Data Processing manager page for more details.
Step 4: Done ! Check your job’s results
After your Spark job is finished, you will be able to check the results from your logs as well as in any connected storage your job was designed to update.
Pricing and limitations
You can find exact capabilities and limitation in official documentation :
|Data Processing - Lab plan|
|Price||100% Free for lab (except if you use object storage)|
Minimum 1 and maximum 30 GB of RAM per Executor
Minimum 1 and maximum 8 Cores of CPU per Executor
Minimum 1 and maximum 10 Executors per job
Maximum 24 hours for the duration of each job
|Amount of Jobs||No limitations|
|Apache Spark versions||available version of Apache Spark is 2.4.3|
Python v2.7+ and v3.4+
Lab access : now available for everyone !
Simply connect to OVhcloud control panel, and use "Data Processing" in the Public Cloud section
Trademark policies :
- Apache Spark brand and logo are the property of Apache Foundations