Skip to content

Useful packages

Pyton Packages

Data Processing

Pandas

Pandas

Pandas Site

Probably the most popular Python library for data science and analytics, if working with data, you’ll most likely be using pandas somewhere along the line.

Polars

TIP

Worth learning if you are working with large datasets, but do not want to use PySpark.

Polars

Polars Site

Polars is a blazingly fast DataFrames library implemented in Rust and based on Apache Arrow, but has bindings to Python.

It is designed for speed and space-efficiency.

The new standard for DataFrames in Python (according to the docs).

PySpark

PySpark

See apache spark section to learn more.

PySpark Section

Scheduling

Rocketry

Rocketry

Official Docs

Airflow

Airflow

Airflow™ is a platform created by the community to programmatically author, schedule and monitor workflows.

See the Airflow Section to learn more.

Http

HTTP

Sync - requests

Requests allows you to send HTTP/1.1 requests extremely easily.

There’s no need to manually add query strings to your URLs, or to form-encode your PUT & POST data — but nowadays, just use the json method!

Docs

bash
poetry add requests

Sync Example

python
import requests

x = requests.get('https://w3schools.com/python/demopage.htm')

print(x.text)

Async - httpx

HTTPX is a fully featured HTTP client for Python 3, which provides sync and async APIs, and support for both HTTP/1.1 and HTTP/2.

Docs

bash
poetry add httpx

Async Example

python
async with httpx.AsyncClient() as client:
    r = await client.get('https://www.example.com/')

r 
<Response [200 OK]>

YAML

PyYAML

PyYAML

PyYAML - Docs

Read

python
import yaml

with open('file.yaml') as f:
    try:
        data = yaml.load(f, Loader=yaml.FullLoader)
        print(data)
    except Exception as e:
        print(e)

Write

python
import yaml

data = {
    'list': [1, 42, 3.141, 1337, 'help'],
    'string': 'bla',
    'dict': {
        'foo': 'bar',
        'key': 'value',
        'bar': 50
    }
}

with open("file_3.yaml", "w") as f:
    yaml.dump(data, f)

SFTP

Paramiko

Paramiko

Paramiko - Docs

Paramiko is a pure python implementation of the SSHv2 protocol, providing both client and server functionality.

Main use cases are:

  • controlling an SSH server
  • transferring files with SFTP

AWS

SDK

AWS

To interact with AWS services, you can use the boto3 library.

python
# Add boto3 to your environment
poetry add boto3

Official Docs

Data Apps

Streamlit

Streamlit

Streamlit turns data scripts into shareable web apps in minutes. All in pure Python. No front‑end experience required.

Streamlit - Site

Feel free to use any content here.