Python is an experiment on how much freedom programmers need. Too much freedom and nobody can read another’s code; too little and expressiveness is endangered.
– Guido van Rossum, creator of Python programming language
Last year, Python was named the most popular programming language. The language’s growing popularity can be attributed to the rise of data science and the machine learning ecosystem and corresponding software libraries like Pandas, Tensorflow, PyTorch, and NumPy, among others. The fact that it is so easy to learn helps Python gain favour among the programmers’ community.
That said, Python is very slow compared to other compiled languages like Rust or Fortran. This is mainly because Python is an interpreted language, which means that a significant overhead is generated for carrying out each instruction. This slows down massive computations. This makes it unsuitable for scientific and high computing contexts. However, in this article, we will explore why this isn’t necessarily a gospel truth and how Python is being preferred for the mentioned tasks.
Python as a glueing layer
In the case of languages like C, C++ or Fortran, the source code is first compiled to an executable format before it can be run. However, with Python, there is no compilation step and the code is interpreted on a fly, line-by-line basis. The main advantage of an interpreted language like Python is that it is flexible, variables do not need to be declared in advance, and the program can adapt on the fly.
However, the main disadvantage, as discussed earlier, is the slower execution of numerically-intensive programs, making it unsuitable for scientific computing. However, time-intensive subroutines can be compiled in C or Fortran and then imported into Python in a way that it appears to behave like normal Python functions.
Many common mathematical and numerical routines are pre-compiled to run very fast. They are grouped into two packages that can be added to Python in a transparent manner. Python is often used as a glueing layer that relies on compiled optimised packages that it strings together to perform the target computations. The most widespread package in scientific computing is NumPy (Numerical Python). The NumPy package offers basic routines for manipulating large arrays and matrices of numeric data. This manipulation is not done in plain Python; instead, all behind the scenes, heavy lifting is done by C/C++ or Fortran compiled routines.
Further, the SciPy (Scientific Python) package extends the functionality of NumPy with its collection of algorithms like minimisation, Fourier transformation, regression and other applied mathematics techniques. The popularity of both packages is soaring in the scientific community. They have also made Python comparable, if not better, than expensive commercial packages like MatLab.
Credit: The COOP Blog
Python for HPC
A team of researchers from the Imperial College, London, demonstrated the viability of Python as a platform for productive, portable and performant HPC applications at petascale. Freddie Witherden, one of the members of this team, said that Python was a ‘first-class language’ for HPC. He gave three reasons for this – increased emphasis on application performance and developer and user productivity with HPC codes; the growing tendency of HPC applications to rely on third-party APIs; the increased use of code generation for addressing performance bottlenecks. He said that Python could address these factors, and it puts the highest levels of performance on HPC hardware within researchers’ access. The team received the nomination for the prestigious ACM Gordon Bell Prize.
Expert opinion
“This is far from the truth, and the answer depends on the layer of the software stack referred to. In particular, the choice of programming language for end-users is very different from the one for those implementing the underlying systems, libraries, compilers, and runtimes. For the former, Python is popular as most end-user programming models are Python-based (e.g. TensorFlow, PyTorch), Python packages for high-performance and scientific computing are widely available, and Python offers high programmer productivity. However, for the underlying programming model implementations, libraries, compilers, and runtimes, C, C++, and CUDA are still the languages of choice as they deliver performance. Ultimately, all the performant Python packages themselves internally map to libraries written in C, C++, or CUDA. High performance is ultimately derived from those optimised libraries or, in some cases, from just-in-time compilers and code generators. So, C and C++ are still the languages of choice to implement the underlying libraries, compilers, code generators, or runtimes,” said Uday Bondhugula, Founder and CTO, Polymage Labs.
“Python is indeed the go-to language in scientific and high-performance computing. Because it is simple, scalable, versatile, efficient and platform-agnostic, it is gaining popularity among programmers, data scientists, ML engineers, and data analysts. It includes hundreds of publicly available libraries and frameworks and feature-rich packages for data manipulation (Pandas) and machine learning (scikit-learn). In addition, Python has been utilised in several enterprise AI frameworks (TensorFlow, PyTorch, etc.). Due to the abundance of open-source tools, academia appears to be moving away from R (and similar platforms), allowing Python to emerge as the preferred language among enthusiasts. Python also scales to enterprise needs by using packages like Dask, PySpark, Koalas and others,” said Pavan Nanjundaiah, Senior Director, Tredence Studio.