Office of Advanced Research Computing
- This event has passed.
Python for Big Data – Part 1
November 11, 2022 @ 2:00 pm - 4:00 pm
Workshop content: In recent years, Python has become one of the top programming languages for doing data analysis due to its inherent advantages such as simplicity, readability, portability, etc., However, Python is slow compared to C or Fortran, and it does not manage memory well. These limitations, with speed and memory management, may not be significant when analyzing small datasets, but they become bottlenecks when analyzing big datasets.
To address the challenges associated with big data analytics, the Python community developed and tested several techniques. In this workshop, we will go through some of these techniques including vectorization, parallelization, just in time compilation, and distributed task executions. We will do hands-on exercises to emphasize the following solutions.
Objectives
- How to speed up the data analysis?
- What to do when the data set size exceeds the available physical memory?
- How to distribute the workloads when doing machine learning for big data sets?
If you have questions or need help, please email Bala Desinghu.
Amarel account: Apply here as soon as possible. You must have an Amarel account set up before the workshop.
VPN setup: You have to be connected on Rutgers’ network or be on VPN to access Amarel resources.
SSH setup: Windows users must install an SSH client like PuTTY or MobaXterm. Alternatively, Windows 10 users can install the complete Windows Subsystem for Linux.