Useful packages
Data Processing
Pandas
Probably the most popular Python library for data science and analytics, if working with data, you’ll most likely be using pandas somewhere along the line.
Polars
TIP
Worth learning if you are working with large datasets, but do not want to use PySpark.
Polars is a blazingly fast DataFrames library implemented in Rust and based on Apache Arrow, but has bindings to Python.
It is designed for speed and space-efficiency.
The new standard for DataFrames in Python (according to the docs).
PySpark
See apache spark section to learn more.
Zip
WIP
Work in progress
Scheduling
Rocketry
Airflow
Airflow™ is a platform created by the community to programmatically author, schedule and monitor workflows.
See the Airflow Section to learn more.
Prefect
WIP
Work in progress
Dagster
WIP
Work in progress
Http
Sync - requests
Requests allows you to send HTTP/1.1 requests extremely easily.
There’s no need to manually add query strings to your URLs, or to form-encode your PUT & POST data — but nowadays, just use the json method!
poetry add requests
Sync Example
import requests
x = requests.get('https://w3schools.com/python/demopage.htm')
print(x.text)
Async - httpx
HTTPX is a fully featured HTTP client for Python 3, which provides sync and async APIs, and support for both HTTP/1.1 and HTTP/2.
poetry add httpx
Async Example
async with httpx.AsyncClient() as client:
r = await client.get('https://www.example.com/')
r
<Response [200 OK]>
YAML
PyYAML
Read
import yaml
with open('file.yaml') as f:
try:
data = yaml.load(f, Loader=yaml.FullLoader)
print(data)
except Exception as e:
print(e)
Write
import yaml
data = {
'list': [1, 42, 3.141, 1337, 'help'],
'string': 'bla',
'dict': {
'foo': 'bar',
'key': 'value',
'bar': 50
}
}
with open("file_3.yaml", "w") as f:
yaml.dump(data, f)
SFTP
Paramiko
Paramiko is a pure python implementation of the SSHv2 protocol, providing both client and server functionality.
Main use cases are:
- controlling an SSH server
- transferring files with SFTP
AWS
SDK
To interact with AWS services, you can use the boto3 library.
# Add boto3 to your environment
poetry add boto3
Data Apps
Streamlit
Streamlit turns data scripts into shareable web apps in minutes. All in pure Python. No front‑end experience required.