Skip to content

Python for Automation: Scripting and Task Scheduling

Updated: at 01:12 AM

Python is a versatile, user-friendly programming language that is ideal for automating repetitive tasks, creating scripts, and scheduling jobs. With its wide range of built-in and third-party libraries and modules, Python provides simple yet powerful tools to automate workflows, reduce human effort, and optimize efficiency. This guide will discuss how to leverage Python for automation, with a focus on scripting and scheduling tasks.

Introduction

Automation is the use of software to create instructions and rules to complete jobs with minimal human intervention. By scripting a sequence of commands, Python can replicate tedious manual processes quickly and accurately. Python’s readability, flexibility, and active community support also make it one of the most popular languages for automation.

Some key advantages of using Python for automation include:

This guide will cover basic Python scripting and task scheduling concepts using relevant code examples. Let’s examine the key aspects of using Python for automation.

Python Scripting for Automation

Python scripts allow automatizing manual processes by executing a series of predefined instructions. Scripts can run commands, call application programming interfaces (APIs), handle files and data, manage networks, and execute many other tasks to replace repetitive human effort.

Here are some examples of how Python scripting can be used for automation:

Basic Script Structure

A Python automation script contains import statements for required modules, function definitions, and a main function that executes the program logic.

Here is a simple example of a Python script structure:

#!/usr/bin/env python3
import os
import sys

def main():
  print("Hello World!")

if __name__ == "__main__":
  main()

Executing Shell Commands

The subprocess module is used to execute shell commands and interact with the operating system. This allows Python to easily run bash scripts, utilities like grep, sed, system administration commands, and any third-party executables.

Here is an example to run a shell command and capture its output:

import subprocess

cmd = "ls -l"
result = subprocess.run(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

print(result.stdout.decode())

The key parameters are:

Multiple commands can be executed by passing a list to subprocess.run():

commands = ["ping -c 4 google.com", "nslookup google.com"]
subprocess.run(commands, shell=True)

Handling Files and Data

Automation scripts frequently need to process files and data. Python has excellent capabilities for file I/O operations.

Reading a text file line-by-line:

with open("data.txt") as file:
  for line in file:
    print(line.strip())

The with block automatically closes the file after the nested code block executes.

Writing data to a CSV file:

import csv

data = [["Name", "Age", "Occupation"],
        ["John", "32", "Developer"],
        ["Mary", "28", "Data Analyst"]]

with open("data.csv", "w") as file:
  writer = csv.writer(file)
  writer.writerows(data)

The csv module is useful for handling CSV data.

These are just some examples of Python’s file handling capabilities - automation tasks can load configuration files, generate reports, process log files, and much more.

Command-line Arguments

For greater flexibility, Python scripts can accept command-line arguments and options. The sys module provides access to these arguments.

import sys

print(sys.argv)

Save the above as args.py and run python args.py -o output.txt:

['args.py', '-o', 'output.txt']

sys.argv[0] contains the script name, other elements are arguments passed to the script.

Arguments can then be parsed as needed in the script:

import sys
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("-o", "--output")
args = parser.parse_args()

print(args.output)

This allows passing dynamic configurations to scripts when executing them.

Logging

For tracking script execution and debugging issues, logging is a built-in module that provides robust logging capabilities:

import logging

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

logger.debug("Debug message")
logger.info("Informational message")
logger.warning("Warning message")
logger.error("Error message")

This will print log messages to the console with timestamps and log levels. The messages can also be written to a file for analysis.

Handling Errors

Automation scripts should handle and log errors gracefully using try-except blocks:

try:
  # Code that might cause an error
except FileNotFoundError as ex:
  # Handle specific exception
  logger.error(ex)
except Exception as ex:
  # Handle any other exceptions
  logger.error(ex)

This ensures the script continues execution even if certain parts fail due to unforeseen issues.

Scheduling Scripts

For tasks that need to run on a schedule, Python provides scheduling capabilities through the schedule and APScheduler modules.

Here is an example to run a script every 2 minutes using schedule:

import schedule
import time

def job():
  print("Scheduled job")

schedule.every(2).minutes.do(job)

while True:
  schedule.run_pending()
  time.sleep(1)

APScheduler has advanced features like cron expressions, job stores, and triggers for more configurable scheduling.

These modules allow automating scripts to run periodically without manual intervention.

Python for Task Scheduling

While scripting focuses on automation logic, task scheduling refers to configuring scripts or jobs to trigger on a timed schedule or event condition. Python has built-in and third-party libraries to schedule all kinds of tasks:

Cron jobs

Run scripts or commands at fixed intervals or dates using cron expressions. Eg. backup every Sunday at 2 AM.

Deferred tasks

Schedule jobs to run after a certain delay from their trigger time. Eg. send email reminders 24 hrs before event.

Periodic tasks

Repeatedly execute jobs at fixed intervals. Eg. update database every 5 minutes.

Trigger-based tasks

Initiate jobs on external events like file arrival, email receipt etc. Eg. process uploaded files from FTP.

Python supports all these types of task scheduling natively or through modules like APScheduler, Celery, Airflow.

APScheduler

One of the most widely used Python modules for scheduling is APScheduler. It provides flexible APIs for creating cron-like scheduled tasks.

Basic usage:

from apscheduler.schedulers.blocking import BlockingScheduler

scheduler = BlockingScheduler()

@scheduler.scheduled_job("interval", minutes=2)
def job():
  print("Scheduled job")

scheduler.start()

The @scheduled_job decorator schedules the job() function to run every 2 minutes. The BlockingScheduler executes jobs in the current thread.

Some key features include:

APScheduler enables running robust scheduled tasks using Python code without separate job scheduler software.

Celery

Celery is another popular task queue and scheduler for Python, focused on distributed systems. It consists of a message broker, backend result store and worker processes. Some key features:

Celery is used to build complex asynchronous and distributed task processing pipelines in Python. It has more moving parts than APScheduler but provides advanced scheduling capabilities.

Airflow

Apache Airflow is an open source workflow automation and scheduling system in Python. It is designed for executing data pipelines across diverse environments. Key features:

Airflow is commonly used within data engineering teams for orchestrating ETL pipelines, data warehousing tasks, machine learning workflows etc that require complex scheduling interdependencies.

Choosing a Scheduler

The choice of scheduling library or system depends on the use case:

Evaluate integration requirements, infrastructure, team skills and workload patterns while selecting a scheduler.

Real-life Automation Examples

Some examples of automating real-world tasks using Python scripting and scheduling:

System administration

Scripts to check disk space, monitor servers, manage backups on a schedule. Eg:

# Check disk usage
import shutil

total, used, free = shutil.disk_usage("/")
print(f"Total: {total//1024//1024} MB")
print(f"Used: {used//1024//1024} MB")
print(f"Free: {free//1024//1024} MB")

Web scraping

Scrape websites, parse content, generate reports. Schedule scripts to run daily.

# Scrape titles from news site
import requests
from bs4 import BeautifulSoup

response = requests.get("http://news-site.com")
soup = BeautifulSoup(response.text, 'html.parser')

for heading in soup.find_all('h2'):
  print(heading.text.strip())

Data pipelines

Schedule and orchestrate ETL jobs using Airflow to populate data warehouse.

Image processing

Script for batch image manipulation, scheduled with cron.

Monitoring

Poll server resources and trigger alerts based on conditions.

Reports

Generate Excel, CSV reports from databases or REST APIs on a schedule.

Conclusion

Python offers many benefits like simplicity, scalability, and vast module ecosystem that make it a great language for scripting as well as automating all kinds of tasks using schedulers.

The standard library and third-party modules provide capabilities for the common automation patterns - shell commands, file handling, arguments, logging, error handling. Robust frameworks like APScheduler, Celery and Airflow enable scheduling repetitive jobs or complex workflows.

With some knowledge of Python syntax, data structures and a review of key modules, programmers can quickly build scripts that replace tedious manual processes with automated routines running precisely on schedule. The abundance of examples and documentation support rapid development cycles.

Automation improves efficiency, reduces costs and allows focusing human effort on creative, strategic tasks. Python plays a big role in enabling scalable automation across fields like data science, DevOps, administration and testing. This guide summarizes the main techniques and libraries to help readers kickstart implementing automation in Python.