How Can You Structure Your Python Script?

You may have begun your Python journey interactively, exploring ideas within Jupyter Notebooks or through the Python REPL. While that’s great for quick experimentation and immediate feedback, you’ll likely find yourself saving code into .py files. However, as your codebase grows, your Python script structure efficiency becomes increasingly important.

Transitioning from interactive environments to structured scripts helps promote readability, enabling better collaboration and more robust development practices. This tutorial transforms messy scripts into well-organized, shareable code. Along the way, you’ll learn standard Python practices and tools. These techniques bridge the gap between quick scripting and disciplined software development.

By the end of this tutorial, you’ll know how to:

Organize your Python scripts logically with functions, constants, and appropriate import practices.
Efficiently manage your script’s state using data structures such as enumerations and data classes.
Enhance interactivity through command-line arguments and improve robustness with structured feedback using logging and libraries like Rich.
Create self-contained, shareable scripts by handling dependencies inline using PEP 723.

Without further ado, it’s time to start working through a concrete script that interacts with a web server to obtain and manipulate a machine learning dataset.

Get Your Code: Click here to download the free sample code you’ll use to learn how you can structure your Python script.

Take the Quiz: Test your knowledge with our interactive “How Can You Structure Your Python Script?” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

How Can You Structure Your Python Script?

In this quiz, you'll test your understanding of organizing and structuring Python scripts. You'll revisit key concepts about best practices for writing clear, maintainable, and executable Python code.

Setting the Stage for Scripting

Throughout this tutorial, you’ll apply the structuring concepts by building a Python script step-by-step. The goal of this script will be to work with the well-known Iris dataset, a classic dataset in machine learning containing measurements for three species of Iris flowers.

Your script, called iris_summary.py, will evolve through several stages, demonstrating different structural improvements. These stages are:

Set Up the Initial Script: Begin with a functional script using standard language features. Apply a foundational structure using named constants for clarity and the entry-point guard to separate executable code from importable definitions.
Integrate External Libraries and Dependencies: Incorporate third-party libraries when needed to leverage specialized functionality or simplify complex tasks. Declare and manage script dependencies within the file using standards like PEP 723 for better reproducibility.
Handle Command-Line Arguments: Add command-line arguments using helper libraries to make the script interactive and configurable. Define a clear main() function to encapsulate the core script logic triggered by the command-line interface (CLI).
Structure Internal Data: Improve how data is represented by selecting appropriate data structures. Move beyond basic types and use constructs like enum for fixed choices, or dataclass and namedtuple for structured records.
Enhance Feedback and Robustness: Refine how the script communicates its progress and results. Implement structured logging instead of relying solely on print(). Use assert statements for internal consistency checks during development, and improve the terminal output presentation, potentially using libraries designed for richer interfaces, like Rich.

By following these steps, you’ll see how structure transforms a basic script into something more robust, readable, and shareable. Each new concept will be introduced and immediately applied to the evolving Iris script.

Before diving into the specifics of script structure, it’s important to understand some foundational elements that make your Python scripts executable and well-organized.

Remove ads

Using the Shebang Line

On Unix-like systems, such as Linux and macOS, you can make your Python script directly executable from the command line, like ./iris_summary.py, instead of always typing python iris_summary.py. This involves making the file executable with chmod +x iris_summary.py, and adding a shebang line at the top of your file.

The shebang tells the system which interpreter to use. The recommended, portable shebang for Python is:

Python
      
    
#!/usr/bin/env python3
# Your script logic goes here...

This small addition signals that your file is intended to be run as a standalone script.

Note: The dedicated tutorial Executing Python Scripts With a Shebang provides a comprehensive look at how a shebang works, why /usr/bin/env is used, how to handle arguments, and how to account for platform differences.

Now that you know how to tell the operating system how to run your script, you can focus on organizing the code within the script, starting with imports.

Organizing the Import Statements

As your script starts interacting with more modules, the import statements at the top of your file become important for clarity and code quality. Python’s official style guide, PEP 8, recommends specific conventions for ordering imports, which significantly improves readability. Following these conventions is standard practice, and there are modern tools like Ruff to enforce these conventions.

Following a standard order helps anyone reading your code quickly understand its dependencies. The recommended grouping is:

Standard Library Imports: Modules included with Python, like pathlib.
Third-Party Imports: Libraries you’ve installed with pip, like requests.
Local Imports: Local modules, either application files or libraries, such as when importing another .py file you wrote.

A good scripting practice for sharing code is to avoid local or library-specific imports and to ensure that only cross-platform third-party packages are used.

Note that for simple, standalone scripts intended for easy sharing—for example, as a GitHub gist—minimizing dependencies is often a goal. This might mean sticking primarily to the standard library and avoiding local imports if possible.

However, as a script grows, using third-party libraries or local helper modules often becomes necessary and beneficial. Managing these dependencies properly—using PEP 723, as you’ll see later—is essential. This advice on minimizing dependencies is less applicable when you’re building a command-line interface (CLI) front end for an existing, complex local library.

Creating the Initial Script

Now, you’ll set up the first version of your iris_summary.py script. It’ll download the dataset using a standard library import:

Pythoniris_summary.py
      
    
#!/usr/bin/env python3

import urllib.request

urllib.request.urlretrieve(
    "https://archive.ics.uci.edu/static/public/53/iris.zip",
    "iris.zip"
)

print("Downloaded iris.zip")

If you save this code as iris_summary.py and run it, it’ll download the iris.zip file into the current directory:

Shell
      
$ python3 iris_summary.py
Downloaded iris.zip
$ ls
iris_summary.py iris.zip

Hard-coding the URL and filename also makes the script inflexible. You can address these issues and improve the execution flow by introducing functions and constants.

Remove ads

Adding Structure With Constants and Entry Points

Your minimal script now downloads the data, but it uses hard-coded strings for the URL and filename, and the download code runs immediately upon execution or import. You can improve on this by defining constants and establishing a proper script entry point.

It’s a good idea to collect the script-level constants into a block immediately below the imports. By convention, Python constants use UPPER_SNAKE_CASE variable names. While Python doesn’t prevent you from changing these variables, the naming signals the intent that they should remain fixed throughout the script’s execution.

You can update the script to define and use constants for the URL and filename like this:

Pythoniris_summary.py
      
    
#!/usr/bin/env python3

import urllib.request

IRIS_DATA_URL = "https://archive.ics.uci.edu/static/public/53/iris.zip"
LOCAL_ZIP_FILENAME = "iris.zip"

urllib.request.urlretrieve(
    IRIS_DATA_URL,
    LOCAL_ZIP_FILENAME
)

print(f"Downloaded {LOCAL_ZIP_FILENAME}")

This change makes the script more readable and maintainable. If the URL changes, then you only need to edit line 5.

However, the execution flow is still not ideal. The urlretrieve() call happens as soon as Python reads the file, whether you’re running it directly with python iris_summary.py, or importing it using import iris_summary. To control this, you need to use a main execution block.

The standard Pythonic way to define code that should only run when the file is executed as a script—and not when it’s imported—is to use a conditional block that checks the special built-in variable __name__.

When Python runs a file as the main script, it automatically sets the variable __name__ for that module to the string "__main__". However, when the file is imported by another module, __name__ is set to the module’s own name, such as iris_summary. This conditional check, often called the if __name__ == "__main__" idiom, can be used to guard your main execution logic:

Pythoniris_summary.py
      
    
#!/usr/bin/env python3

import urllib.request

IRIS_DATA_URL = "https://archive.ics.uci.edu/static/public/53/iris.zip"
LOCAL_ZIP_FILENAME = "iris.zip"

def main():
    """Fetch the Iris dataset from UCI."""
    urllib.request.urlretrieve(
        IRIS_DATA_URL,
        LOCAL_ZIP_FILENAME
    )
    print(f"Downloaded {LOCAL_ZIP_FILENAME}")

if __name__ == "__main__":
    main()

You also introduce a function with a docstring. The main() function is used by convention as an entry point to your script. Now, the download logic inside the if block only executes when you run the file directly. If you import iris_summary, then the constants will be defined, but the download won’t automatically happen.

Ideally, scripts should never rely on default execution order and instead provide an explicit entry point using __name__. This pattern is fundamental for creating reusable and well-behaved Python modules and scripts. It cleanly separates the code defining what the script can do—like functions, classes—from the specific actions it should perform when run directly.

Managing Dependencies With PEP 723

Your current script uses urllib.request from the standard library to download the data. While this works, it only fetches the ZIP file. You’d still need to add code to unzip it and parse the data file or files.

Often, specialized third-party libraries can simplify common tasks. For interacting with the UCI Machine Learning Repository, the maintainers provide a dedicated library called ucimlrepo. This library handles the details of fetching and loading datasets, providing the data in a more structured format. Using it is generally preferable to manual downloading for UCI datasets.

To use ucimlrepo, however, you first need to install it, typically using pip in a virtual environment:

Windows PowerShell
      
PS> python -m venv venv
PS> venv\Scripts\Activate.ps1
(venv) PS> python -m pip install ucimlrepo

Shell
      
$ python3 -m venv venv/
$ source venv/bin/activate
(venv) $ python -m pip install ucimlrepo

Now your script can be refactored to use ucimlrepo:

Pythoniris_summary.py
      
    
#!/usr/bin/env python3

from ucimlrepo import fetch_ucirepo

IRIS_DATASET_ID = 53

def main():
    """Fetch the Iris dataset and show a variable summary."""
    print("Fetching Iris dataset using ucimlrepo...")
    iris = fetch_ucirepo(id=IRIS_DATASET_ID)
    print("Dataset fetched successfully. Variable summary:")
    print(iris.variables)

if __name__ == "__main__":
    main()

The fetch_ucirepo() function in line 10 handles both retrieval and loading of the dataset using the dataset ID defined as a constant in line 5. With the main() function suitably modified, you can now present a summary of the data:

Shell
      
        
      
    
(venv) $ python iris_summary.py
Fetching Iris dataset using ucimlrepo...
Dataset fetched successfully. Variable summary:
           name     role  ... units missing_values
0  sepal length  Feature  ...    cm             no
1   sepal width  Feature  ...    cm             no
2  petal length  Feature  ...    cm             no
3   petal width  Feature  ...    cm             no
4         class   Target  ...  None             no

[5 rows x 7 columns]

But how does someone else running your script know they need to install ucimlrepo? And how do you specify the correct version? This is where dependency management for scripts becomes important.

While full Python projects use files like pyproject.toml, scripts need to be more self-contained. A standard for declaring dependencies directly within a script, using specially formatted comments, is defined in PEP 723, which introduces inline script metadata for single-file scripts.

A tool that understands PEP 723 can read these comments and automatically create an environment with the specified dependencies to run the script. First, you’ll need to check which library version is installed in your environment:

Shell
      
        
      
    
(venv) $ python -m pip show ucimlrepo
Name: ucimlrepo
Version: 0.0.7
Summary: Package to easily import datasets from the UC Irvine
⮑ Machine Learning Repository into scripts and notebooks.
Home-page: https://github.com/uci-ml-repo/ucimlrepo
Author: Philip Truong
Author-email: Philip Truong <ucirepository@gmail.com>
License:
Location: /venv/lib/python3.13/site-packages
Requires: certifi, pandas
Required-by:

Then, you’ll embed the metadata necessary, including the library name and version:

Pythoniris_summary.py
      
    
#!/usr/bin/env python3

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "ucimlrepo==0.0.7",
# ]
# ///

from ucimlrepo import fetch_ucirepo

IRIS_DATASET_ID = 53

def main():
    """Fetch the Iris dataset and show a variable summary."""
    print("Fetching Iris dataset using ucimlrepo...")
    iris = fetch_ucirepo(id=IRIS_DATASET_ID)
    print("Dataset fetched successfully. Variable summary:")
    print(iris.variables)

if __name__ == "__main__":
    main()

The highlighted block informs PEP 723-aware tools of the script requirements, including external dependencies.

To resolve the dependencies, the Python interpreter alone is no longer sufficient. A PEP 723-aware tool like pipx or uv is needed to set up a temporary virtual environment with the right dependencies for execution. As a fast and popular Rust-based Python installer and resolver, uv is a great choice, and you can install it with this command:

Windows PowerShell
      
PS> powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Shell
      
$ curl -LsSf https://astral.sh/uv/install.sh | sh

Before you can see the utility of this, you’ll need to deactivate the existing virtual environment, removing it for good measure:

Shell
      
(venv) $ deactivate && rm -rf venv/

Now your script with its dependencies can be shared and reproduced for execution:

Shell
      
$ uv run iris_summary.py
Installed 8 packages in 52ms
Fetching Iris dataset using ucimlrepo...
(...)

Regardless of the specific tool, the key idea of PEP 723 is to make the script self-documenting regarding its dependencies, enabling reproducible execution environments. Your script now uses a dedicated library for data fetching and clearly declares its dependencies, making it more robust and easier for others to run correctly.

Remove ads

Handling Command-Line Arguments

Your script now fetches the Iris dataset and summarizes its variables. Other summaries can be written into the script. However, in practice, you’ll want to be able to interact with the underlying program without making changes to the codebase. So, you want to be able to control the flow of execution dynamically, which means passing arguments to the application.

While Python’s built-in argparse module can handle this, third-party libraries like Click offer a more intuitive and Pythonic way to create command-line interfaces using decorators.

Using Click, you can now support the option of getting the metadata for the dataset:

Python
      
    
#!/usr/bin/env python3

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "click==8.1.8",
#   "ucimlrepo==0.0.7",
# ]
# ///

from pprint import pprint as pp

import click
from ucimlrepo import fetch_ucirepo

IRIS_DATASET_ID = 53

@click.command()
@click.option(
    "--operation",
    default="summary",
    type=click.Choice(["summary", "metadata"]),
    help="Operation to perform: variable summary or dataset metadata",
)
def main(operation):
    """Fetch and print a summary of the Iris dataset from UCI."""
    print("Fetching Iris dataset using ucimlrepo...")
    iris = fetch_ucirepo(id=IRIS_DATASET_ID)
    print("Dataset fetched successfully.")

    if operation == "summary":
        print("Variable summary:")
        pp(iris.variables)
    elif operation == "metadata":
        print("Metadata summary:")
        pp(iris.metadata)

if __name__ == "__main__":
    main()

The arguments to the function are the options defined by functionality in Click, which also automatically handles parsing the arguments, providing help messages, and calling your main function with the correct values. For better output formatting, pprint is a nicer way to represent nested dictionaries and other internal structures. At this point, your code can now accept user inputs with validation, which you can use to check the metadata:

Shell
      
        
      
    
$ uv run iris_summary.py --operation metadata
Fetching Iris dataset using ucimlrepo...
Dataset fetched successfully.
Metadata summary:
{'abstract': 'A small classic dataset from Fisher, 1936. One of the earliest '
             'known datasets used for evaluating classification methods.\n',
 'additional_info': {'citation': None,
 (...)
 'uci_id': 53,
 'year_of_dataset_creation': 1936}

Your script now formats the output nested dictionary in a cleaner manner using an optional --operation runtime argument. You can also double-check the handling of invalid choices:

Shell
      
$ uv run iris_summary.py --operation variables
Usage: iris_summary.py [OPTIONS]
Try 'iris_summary.py --help' for help.

Error: Invalid value for '--operation':
⮑ 'variables' is not one of 'summary', 'metadata'.

As shown, an unsupported operation leads to an informative error message. By adding Click, you’ve made your script flexible and user-friendly, allowing runtime customization through command-line arguments. Now that your script effectively handles external inputs and commands, you can turn your attention inward and explore how to better represent and manage the data used within the script itself using specialized data structures.

Choosing Appropriate Data Structures for Scripts

Your script now fetches data based on user commands. As you start processing that data or managing internal configurations, it’s crucial to structure your code clearly for clarity and maintenance. Python offers several powerful data structures suitable for different needs within scripts. Choosing the right one involves considering factors like readability, mutability, boilerplate code, and even testing implications.

Using `enum` for Constants and Options

Currently, you’re using the raw integer 53 for the Iris dataset ID. While storing it in a constant IRIS_DATASET_ID was an improvement, what if your script needed to handle multiple known datasets?

Using an enumeration provides a more structured and readable way to manage such fixed sets of identifiers. You can define one for known UCI datasets using the IntEnum class to ensure that integers are provided to fetch_ucirepo():

Pythoniris_summary.py
      
    
# ...

from enum import IntEnum
from pprint import pprint as pp

import click
from ucimlrepo import fetch_ucirepo

class UCIDataset(IntEnum):
    IRIS = 53

@click.command()
@click.option(
    "--operation",
    default="summary",
    type=click.Choice(["summary", "metadata"]),
    help="Operation to perform: variable summary or dataset metadata",
)
def main(operation):
    """Fetch and print a summary of the Iris dataset from UCI."""
    print("Fetching Iris dataset using ucimlrepo...")
    iris = fetch_ucirepo(id=UCIDataset.IRIS.value)
    print("Dataset fetched successfully.")

    # ...

# ...

By aliasing the integer with an enumeration, your script becomes more self-documenting. You’ll also be able to define the allowed values within each context. For scripts in particular, enum is excellent for mapping string or integer inputs to internal states, defining command sets, or representing any fixed category.

The StrEnum, introduced in Python 3.11, is especially convenient when you need enum members that also behave like strings. The Iris dataset contains four feature columns:

sepal length
sepal width
petal length
petal width

To allow selection of each column from the command line, you might use a StrEnum as follows:

Pythoniris_summary.py
      
    
# ...

from enum import IntEnum, StrEnum
from pprint import pprint as pp

import click
from ucimlrepo import fetch_ucirepo

class UCIDataset(IntEnum):
    IRIS = 53

class IrisVariable(StrEnum):
    PETAL_LENGTH = "petal length"
    PETAL_WIDTH = "petal width"
    SEPAL_WIDTH = "sepal width"
    SEPAL_LENGTH = "sepal length"

@click.command()
@click.option(
    "--operation",
    default="summary",
    type=click.Choice(["summary", "metadata"]),
    help="Operation to perform: variable summary or dataset metadata",
)
@click.option(
    "--variable",
    type=click.Choice(IrisVariable),
    help="Variable to summarize.",
    required=False,
)
def main(operation, variable):
    """Fetch and print a summary of the Iris dataset from UCI."""
    print("Fetching Iris dataset using ucimlrepo...")
    iris = fetch_ucirepo(id=UCIDataset.IRIS.value)
    print("Dataset fetched successfully.")

    if operation == "summary":
        if variable:
            print(f"{IrisVariable(variable)} summary:")
            pp(iris.data.features[IrisVariable(variable).value])
        else:
            print("All variables:")
            pp(iris.variables)
    elif operation == "metadata":
        print("Metadata summary:")
        pp(iris.metadata)

# ...

Your script now decouples the handling of choices from their declaration as an enumeration so they’re no longer hard-coded. With this, you can now view a summary of a given variable name:

Shell
      
        
      
    
$ uv run iris_summary.py --operation summary --variable "sepal length"
Fetching Iris dataset using ucimlrepo...
Dataset fetched successfully.
sepal length summary:
0      5.1
1      4.9
2      4.7
3      4.6
4      5.0
      ...
145    6.7
146    6.3
147    6.5
148    6.2
149    5.9
Name: sepal length, Length: 150, dtype: float64

Along the same lines, you can refactor the operations flag into its own structure:

Pythoniris_summary.py
      
    
# ...

from enum import IntEnum, StrEnum, auto
from pprint import pprint as pp

import click
from ucimlrepo import fetch_ucirepo

class UCIDataset(IntEnum):
    IRIS = 53

class IrisVariable(StrEnum):
    PETAL_LENGTH = "petal length"
    PETAL_WIDTH = "petal width"
    SEPAL_WIDTH = "sepal width"
    SEPAL_LENGTH = "sepal length"

class Operation(StrEnum):
    SUMMARY = auto()
    METADATA = auto()

@click.command()
@click.option(
    "--operation",
    default=Operation.SUMMARY,
    type=click.Choice(Operation),
    help="Operation to perform: variable summary or dataset metadata",
)
@click.option(
    "--variable",
    type=click.Choice(IrisVariable),
    help="Variable to summarize.",
    required=False,
)
def main(operation, variable):
    """Fetch and print a summary of the Iris dataset from UCI."""
    print("Fetching Iris dataset using ucimlrepo...")
    iris = fetch_ucirepo(id=UCIDataset.IRIS.value)
    print("Dataset fetched successfully.")

    if operation is Operation.SUMMARY:
        if variable:
            print(f"{IrisVariable(variable)} summary:")
            pp(iris.data.features[IrisVariable(variable).value])
        else:
            print("All variables:")
            pp(iris.variables)
    elif operation is Operation.METADATA:
        print("Metadata summary:")
        pp(iris.metadata)

# ...

Using auto(), you can generate variable values without repetition. Enumerations are excellent for managing predefined choices or mappings like this. However, when you need to structure more complex information returned by functions or generated during processing, other data structures are often more suitable. Next, you’ll explore how data classes can help represent structured data records flexibly.

Remove ads

Using `dataclass` for Flexible Records

When processing data or passing structured information between parts of your script, using data classes provides a great balance of features and convenience.

Data classes use type hints to define fields and automatically generate useful methods like .__init__() and .__repr__(). This significantly reduces the boilerplate code you’d typically write for a manual class definition when primarily storing data.

Furthermore, you can easily add custom methods to bundle behavior with the data. For instance, you might want to calculate and store descriptive statistics—such as a measure of distributional shape from the difference between the mean and median—directly within the data structure:

Pythoniris_summary.py
      
    
#!/usr/bin/env python3

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "click==8.1.8",
#   "pandas==2.2.3",
#   "ucimlrepo==0.0.7",
# ]
# ///

from dataclasses import dataclass, field
from enum import IntEnum, StrEnum, auto
from pprint import pformat, pprint as pp

import click
import pandas as pd
from ucimlrepo import fetch_ucirepo

# ...

@dataclass
class DescriptiveStatistics:
    data: pd.Series
    mean: float = field(init=False)
    median: float = field(init=False)
    mm_diff: float = field(init=False)

    def __post_init__(self):
        if not isinstance(self.data, pd.Series):
            raise TypeError(
                f"data must be a pandas Series, not {type(self.data)}"
            )
        self.mean = self.data.mean()
        self.median = self.data.median()
        self.mm_diff = self.mean - self.median

    def __str__(self):
        return pformat(self)

# ...
def main(operation, variable):
    """Fetch and print a summary of the Iris dataset from UCI."""
    print("Fetching Iris dataset using ucimlrepo...")
    iris = fetch_ucirepo(id=UCIDataset.IRIS.value)
    print("Dataset fetched successfully.")

    if operation is Operation.SUMMARY:
        if variable:
            print(f"{IrisVariable(variable)} summary:")
            print(
                DescriptiveStatistics(
                    iris.data.features[IrisVariable(variable).value]
                )
            )
        else:
            print("All variables:")
            pp(iris.variables)
    elif operation is Operation.METADATA:
        print("Metadata summary:")
        pp(iris.metadata)

# ...

The post-initialization of your data class ensures the population of the summary statistics of interest, and a customized .__str__() method ensures a clean string representation when printed:

Shell
      
        
      
    
$ uv run iris_summary.py --operation summary --variable "sepal length"
Fetching Iris dataset using ucimlrepo...
Dataset fetched successfully.
sepal length summary:
DescriptiveStatistics(data=0      5.1
1      4.9
2      4.7
3      4.6
4      5.0
      ...
145    6.7
146    6.3
147    6.5
148    6.2
149    5.9
Name: sepal length, Length: 150, dtype: float64,
                      mean=np.float64(5.843333333333334),
                      median=np.float64(5.8),
                      mm_diff=np.float64(0.04333333333333389))

This demonstrates how you can use a data class to neatly bundle data with its associated processing logic, making the main script flow cleaner.

Considering Other Structures

You may also define a full custom class. These are typically reserved for cases where object-oriented features like inheritance or complex patterns are truly needed.

However, for many scripts, the requirement for brevity often means that the overhead of writing and maintaining methods like .__init__(), .__repr__() can be overkill compared to using a data class. For single-file scripts, comprehensive testing might not always be part of the initial design, which can make it more difficult to justify highly complex class operations.

When a function needs to return simple, immutable records, or you need basic data containers with named field access and minimal boilerplate, collections.namedtuple remains a concise and efficient option. Its terse single-line definition, guaranteed immutability, low overhead, and clear name-based access make it ideal for scripting scenarios where associated methods or the features of data classes aren’t required.

Selecting the appropriate structure lays a solid foundation for managing data within your script. With internal organization improved, the next step is to consider how your script communicates its actions and results, moving beyond basic output.

Improving Script Feedback

Your script now has a good internal structure for data representation and handling arguments. But how does it communicate what it’s doing, especially during development or when things go wrong? Relying solely on Python’s print() function has limitations.

It’s worthwhile to explore more structured ways to provide feedback and ensure correctness using logging, assertions, and the Rich library. From a scripting perspective, while print() works for basic output, it comes with notable limitations—especially in larger or more complex applications:

It mixes outputs: Status updates, debug information, and final results go through the same channel, making it hard to separate them.
It has no severity levels: There’s no built-in way to easily distinguish between informational messages, warnings, and critical errors.
It’s hard to control: Turning debug messages on and off requires manually adding and removing print() calls.

Python’s built-in logging module provides a much more flexible and standard way to record events. You can configure different logging levels, including DEBUG, INFO, WARNING, ERROR, and CRITICAL, direct output to files or the console, and control formatting. You can leverage pprint with logging using pformat(), and otherwise replace print() with logging.info() for the most part.

Knowing this, you can now refactor your script to incorporate basic logging as shown below:

Pythoniris_summary.py
      
    
# ...

import logging
import sys
from dataclasses import dataclass, field
from enum import IntEnum, StrEnum, auto
from pprint import pformat

import click
import pandas as pd
from ucimlrepo import fetch_ucirepo

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

# ...
def main(operation, variable):
    """Fetch and print a summary of the Iris dataset from UCI."""
    iris = fetch_iris()
    if operation is Operation.SUMMARY:
        if variable:
            logging.info(f"{IrisVariable(variable)} summary:")
            logging.info(
                DescriptiveStatistics(
                    iris.data.features[IrisVariable(variable).value]
                )
            )
        else:
            logging.info("All variables:")
            logging.info(pformat(iris.variables))
    elif operation is Operation.METADATA:
        logging.info("Metadata summary:")
        logging.info(pformat(iris.metadata))

def fetch_iris():
    """Return the Iris dataset from the UCI ML Repository."""
    logging.info("Fetching Iris dataset...")
    try:
        iris_data = fetch_ucirepo(id=UCIDataset.IRIS.value)
    except Exception as e:
        logging.critical(f"Failed to correctly fetch Iris dataset: {e}")
        sys.exit(1)
    else:
        logging.info("Iris dataset fetched successfully")
        return iris_data

if __name__ == "__main__":
    main()

Following convention, the minimum logging level and format are set at the module level in lines 13 to 16. The data fetching logic is moved to a dedicated function, fetch_iris(), which handles logging severity along with fatal error abortions through sys.exit().

For potentially large or nested data like the dataset metadata, pformat() is used within logging.info() to generate a multiline, formatted string suitable for logging. Now, instead of raw print() output, your script generates timestamped logs categorized by severity level:

Shell
      
        
      
    
$ uv run iris_summary.py --operation summary --variable "sepal length"
2025-04-16 20:56:37,633 - INFO - Fetching Iris dataset...
2025-04-16 20:56:44,107 - INFO - Iris dataset fetched successfully
2025-04-16 20:56:44,107 - INFO - sepal length summary:
2025-04-16 20:56:44,108 - INFO - DescriptiveStatistics(data=0      5.1
1      4.9
2      4.7
3      4.6
4      5.0
      ...
145    6.7
146    6.3
147    6.5
148    6.2
149    5.9
Name: sepal length, Length: 150, dtype: float64,
                      mean=np.float64(5.843333333333334),
                      median=np.float64(5.8),
                      mm_diff=np.float64(0.04333333333333389))

Your script is now ready to generate configurable status output, with messages tagged by severity and verbosity—both of which can be controlled by a user-defined parameter, such as a command-line argument or an environment variable.

With structured logging handling the script’s communication about its progress and runtime events, you might also want ways to verify internal assumptions during development. While logging reports on what happened, sometimes you need checks to ensure the script’s state is exactly what you expect it to be before proceeding. Python’s assert statement is designed for exactly this kind of internal sanity check.

Remove ads

Adding Internal Checks With `assert`

Sometimes, during development, you want to add checks to ensure the script’s internal state is as expected. The assert statement is perfect for this. It takes a condition and an optional message. If the condition is False, then it raises an AssertionError with the message. Otherwise, the script continues executing normally.

Asserts are primarily debugging aids for the developer. They’re not intended to handle expected user errors like invalid user input—which Click can handle—or FileNotFoundError exceptions, which should be caught with try...except. Instead, they verify your own assumptions about the code’s state.

A key feature of assertions is that they can be disabled globally if Python is run with the -O (optimize) flag, as in python -O iris_summary.py, meaning they have no performance impact in optimized runs.

For your script, you can add a simple assertion after fetching the data to ensure the returned object looks correct:

Pythoniris_summary.py
      
    
# ...

def fetch_iris():
    """Return the Iris dataset from the UCI ML Repository."""
    logging.info("Fetching Iris dataset...")
    try:
        iris_data = fetch_ucirepo(id=UCIDataset.IRIS.value)
        assert "data" in iris_data.keys(), \
            "Object does not have expected structure"
    except Exception as e:
        logging.critical(f"Failed to correctly fetch Iris dataset: {e}")
        sys.exit(1)
    else:
        logging.info("Iris dataset fetched successfully")
        return iris_data

if __name__ == "__main__":
    main()

If later versions of ucimlrepo change the internal structure, then the script will raise an informative error:

Shell
      
$ uv run iris_summary.py --operation summary --variable "sepal length"
2025-04-16 21:10:20,180 - INFO - Fetching Iris dataset...
2025-04-16 21:10:21,601 - CRITICAL - Failed to correctly fetch Iris dataset:
⮑ Object does not have expected structure

This message signals to a developer that an internal assumption has been violated.

Enhancing Output With Rich

While logging handles status and debug messages, you often want the final output to be presented clearly and attractively to the user. The Rich library is fantastic for creating beautiful terminal output across different operating systems with colors, tables, Markdown, progress bars, and more.

You can also use it to override the default handler for logging and exceptions:

Pythoniris_summary.py
      
    
#!/usr/bin/env python3

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "click==8.1.8",
#   "pandas==2.2.3",
#   "rich==14.0.0",
#   "ucimlrepo==0.0.7",
# ]
# ///

import logging
import sys
from dataclasses import dataclass, field
from enum import IntEnum, StrEnum, auto
from pprint import pformat

import click
import pandas as pd
from rich.logging import RichHandler
from ucimlrepo import fetch_ucirepo

logging.basicConfig(
    level=logging.INFO,
    format="%(levelname)s - %(message)s",
    handlers=[RichHandler(rich_tracebacks=True)]
)

# ...

Now, the date and time are handled by RichHandler. This leads to much cleaner output:

Beyond logs and exceptions, you typically create a Console object to handle printing. You can then use specific Rich objects like Table.

Here’s an example of how you can replace the basic printing of the variable summary statistics with a Rich table:

Pythoniris_summary.py
      
    
# ...

import click
import pandas as pd
from rich.console import Console
from rich.logging import RichHandler
from rich.table import Table
from ucimlrepo import fetch_ucirepo

# ...

# ...
def main(operation, variable):
    """Fetch and print a summary of the Iris dataset from UCI."""
    console = Console()
    iris = fetch_iris()
    if operation is Operation.SUMMARY:
        if variable:
            table = generate_table(iris, variable)
            console.print(table)
        else:
            logging.info("All variables:")
            logging.info(pformat(iris.variables))
    elif operation is Operation.METADATA:
        logging.info("Metadata summary:")
        logging.info(pformat(iris.metadata))

# ...

def generate_table(dataset, variable):
    """Generate a formatted table of descriptive statistics for a variable."""
    column = IrisVariable(variable).value
    stats = DescriptiveStatistics(dataset.data.features[column])
    table = Table(title=f"{column} summary")
    table.add_column("Metric", style="cyan", justify="right")
    table.add_column("Value", style="magenta")
    table.add_row("Mean", f"{stats.mean:.2f}")
    table.add_row("Median", f"{stats.median:.2f}")
    table.add_row("Mean-Median Diff", f"{stats.mm_diff:.2f}")
    return table

if __name__ == "__main__":
    main()

This revised code introduces a generate_table() function that encapsulates the creation of a formatted rich.Table object, populated with the statistics derived from the DescriptiveStatistics data class.

A Console object is used within the main() function for printing. This takes the responsibility of interpreting objects with style information to render the styled, formatted table directly to your terminal. This results in a much clearer and more aesthetically pleasing presentation than standard print() can provide.

By generating a statistics table and printing it with Rich’s colored output, you create a more polished final result:

Shell
      
        
      
    
$ uv run iris_summary.py --operation summary --variable "sepal length"
[04/17/25 09:21:55] INFO     INFO - Fetching Iris dataset...  iris_summary.py:93
[04/17/25 09:21:56] INFO     INFO - Iris dataset fetched     iris_summary.py:102
                             successfully
    sepal length summary
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃           Metric ┃ Value ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│             Mean │ 5.84  │
│           Median │ 5.80  │
│ Mean-Median Diff │ 0.04  │
└──────────────────┴───────┘

When you combine logging with Rich-rendered output, you’ll need a helper function to strip out ANSI color codes, since the standard logging library can only handle plain strings:

Pythoniris_summary.py
      
    
# ...

import click
import pandas as pd
from rich.console import Console, Text
from rich.logging import RichHandler
from rich.table import Table
from ucimlrepo import fetch_ucirepo

# ...

# ...
def main(operation, variable):
    """Fetch and print a summary of the Iris dataset from UCI."""
    iris = fetch_iris()
    if operation is Operation.SUMMARY:
        if variable:
            table = generate_table(iris, variable)
            logging.info(format_rich_for_log(table))
            logging.info(f"{IrisVariable(variable)} summary:")
            logging.info(
                DescriptiveStatistics(
                    iris.data.features[IrisVariable(variable).value]
                )
            )
        else:
            logging.info("All variables:")
            logging.info(pformat(iris.variables))
    elif operation is Operation.METADATA:
        logging.info("Metadata summary:")
        logging.info(pformat(iris.metadata))

# ...

def format_rich_for_log(renderable, width=100):
    """Render a rich object to a plain text string suitable for logging."""
    console = Console(width=width)
    with console.capture() as capture:
        console.print(renderable)
    return Text.from_ansi(capture.get())

if __name__ == "__main__":
    main()

You essentially need to print the table to a separate console object to generate the plain-text representation for the log. This kind of log is often better suited to more complex scenarios than simple scripts.

Often, the user experience is of primary importance, to the point that it’s acceptable to use console.print() throughout the main() function of the script. Logging details should be delegated to other functions, like the fetch_iris() helper.

By incorporating logging, assert statements, and libraries like Rich, you can make your scripts more robust during development and provide much clearer, more helpful feedback during execution.

Remove ads

Following Python Script Structure Recommendations

Based on the structures and techniques you’ve learned about, here are a few recommendations to keep in mind when you’re writing Python scripts:

Strive for brevity and clarity: Scripts often benefit from being direct. Use clear names for constants, functions, and variables. While functions and classes help organize your code, avoid overly deep nesting or abstraction if simpler, linear code within the main block is easier to follow for a specific task.
Leverage argument parsing for input validation: Tools like Click are excellent not just for defining arguments, but also for validating user input at the boundary of your script—for example, using click.Choice or type=int. Handling input validation here often reduces the need for extensive try...except blocks that check types or values deep within your core logic functions, keeping them cleaner.
Embrace self-contained dependencies: For scripts you intend to share, PEP 723 is invaluable. Declaring dependencies within the script file makes it reproducible and much easier for others—and your future self—to run correctly using tools like uv or pipx.
Choose data structures wisely: Select the simplest structure that meets your needs for clarity and maintainability.

Here’s a quick reference table comparing common data structures in a script context. It summarizes when each one is most appropriate based on your script’s complexity and goals:

Structure	Use Case	Recommendation
`enum.Enum`	Representing fixed sets of choices, states, modes, and mapping inputs.	Use for clarity and type safety over raw strings or integers for predefined choices.
`collections.namedtuple`	Simple, immutable data bundles, function return values. Named access with low overhead.	Use for concise, fixed records where immutability is paramount.
`dataclasses.dataclass`	Flexible data records with typing, less boilerplate, and easy method addition.	Use as a great default for most structured data. Balances features, readability, and ease.
`class` (custom)	Complex state, behavior, inheritance patterns. Full OOP control.	Use when full OOP power is necessary. Consider verbosity and testing needs.

For the level of complexity in your script, data classes and enumerations offer the most suitable combination of structure and simplicity.

Putting It All Together

You’ve seen how each concept—constants, a guarded entry point, PEP 723 dependencies, argument handling, internal structures, logging, and primary output—helps to structure and strengthen a Python script. If you’d like to see the complete, final version of the iris_summary.py script, click the Show/Hide toggle below:

Pythoniris_summary.py
      
    
#!/usr/bin/env python3

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "click==8.1.8",
#   "pandas==2.2.3",
#   "rich==14.0.0",
#   "ucimlrepo==0.0.7",
# ]
# ///

import logging
import sys
from dataclasses import dataclass, field
from enum import IntEnum, StrEnum, auto
from pprint import pformat

import click
import pandas as pd
from rich.console import Console, Text
from rich.logging import RichHandler
from rich.table import Table
from ucimlrepo import fetch_ucirepo

logging.basicConfig(
    level=logging.INFO,
    format="%(levelname)s - %(message)s",
    handlers=[RichHandler(rich_tracebacks=True)]
)

class UCIDataset(IntEnum):
    IRIS = 53

class IrisVariable(StrEnum):
    PETAL_LENGTH = "petal length"
    PETAL_WIDTH = "petal width"
    SEPAL_WIDTH = "sepal width"
    SEPAL_LENGTH = "sepal length"

class Operation(StrEnum):
    SUMMARY = auto()
    METADATA = auto()

@dataclass
class DescriptiveStatistics:
    data: pd.Series
    mean: float = field(init=False)
    median: float = field(init=False)
    mm_diff: float = field(init=False)

    def __post_init__(self):
        if not isinstance(self.data, pd.Series):
            raise TypeError(
                f"data must be a pandas Series, not {type(self.data)}"
            )
        self.mean = self.data.mean()
        self.median = self.data.median()
        self.mm_diff = self.mean - self.median

    def __str__(self):
        return pformat(self)

@click.command()
@click.option(
    "--operation",
    default=Operation.SUMMARY,
    type=click.Choice(Operation),
    help="Operation to perform: variable summary or dataset metadata",
)
@click.option(
    "--variable",
    type=click.Choice(IrisVariable),
    help="Variable to summarize.",
    required=False,
)
def main(operation, variable):
    """Fetch the Iris dataset from UCI."""
    iris = fetch_iris()
    if operation is Operation.SUMMARY:
        if variable:
            table = generate_table(iris, variable)
            logging.info(format_rich_for_log(table))
            logging.info(f"{IrisVariable(variable)} summary:")
            logging.info(
                DescriptiveStatistics(
                    iris.data.features[IrisVariable(variable).value]
                )
            )
        else:
            logging.info("All variables:")
            logging.info(pformat(iris.variables))
    elif operation is Operation.METADATA:
        logging.info("Metadata summary:")
        logging.info(pformat(iris.metadata))

def fetch_iris():
    """Return the Iris dataset from the UCI ML Repository."""
    logging.info("Fetching Iris dataset...")
    try:
        iris_data = fetch_ucirepo(id=UCIDataset.IRIS.value)
        assert "data" in iris_data.keys(), \
            "Object does not have expected structure"
    except Exception as e:
        logging.critical(f"Failed to correctly fetch Iris dataset: {e}")
        sys.exit(1)
    else:
        logging.info("Iris dataset fetched successfully")
        return iris_data

def generate_table(dataset, variable):
    """Generate a formatted table of descriptive statistics for a variable."""
    column = IrisVariable(variable).value
    stats = DescriptiveStatistics(dataset.data.features[column])
    table = Table(title=f"{column} summary")
    table.add_column("Metric", style="cyan", justify="right")
    table.add_column("Value", style="magenta")
    table.add_row("Mean", f"{stats.mean:.2f}")
    table.add_row("Median", f"{stats.median:.2f}")
    table.add_row("Mean-Median Diff", f"{stats.mm_diff:.2f}")
    return table

def format_rich_for_log(renderable, width=100):
    """Render a rich object to a plain text string suitable for logging."""
    console = Console(width=width)
    with console.capture() as capture:
        console.print(renderable)
    return Text.from_ansi(capture.get())

if __name__ == "__main__":
    main()

As you can see, applying key structuring concepts and feedback transforms a simple task into a well-structured program. This approach makes your scripts significantly easier to understand, modify, and share.

Note that in this final version, the primary output within main() uses console.print(). Logging is primarily handled in helper functions like fetch_iris(), which avoids the need to convert the rich table for logging within main(). While the assert statement was demonstrated earlier, it’s often removed or refined in final scripts unless verifying critical invariants.

Conclusion

Congratulations! You now have a solid understanding of how to structure your Python scripts effectively, moving beyond simple top-down execution to create more organized, readable, maintainable, and shareable code. You’ve seen how applying standard Python features and conventions, along with useful libraries, can significantly improve your scripts.

In this tutorial, you’ve learned how to:

Organize your scripts with standard import groupings and meaningful constants
Define a clear script entry point using if __name__ == "__main__" to control execution
Make scripts directly executable on Unix-like systems with a shebang
Make scripts self-contained by managing dependencies directly within the file using PEP 723
Build flexible command-line interfaces using decorators with Click
Improve script feedback and debugging using logging, assert, and enhanced terminal output with Rich

By applying these techniques, your scripts become easier to understand, modify, and share with others. Continuously applying these structural principles and quality practices will help you write Python scripts that aren’t just functional, but also robust and professional. Happy scripting!

Get Your Code: Click here to download the free sample code you’ll use to learn how you can structure your Python script.

Frequently Asked Questions

Now that you have some experience with structuring your Python scripts, you can use the questions and answers below to check your understanding and recap what you’ve learned.

These FAQs are related to the most important concepts you’ve covered in this tutorial. Click the Show/Hide toggle beside each question to reveal the answer.

A Python script is a file containing Python code, typically ending with a .py extension that you can execute directly to perform specific tasks. You structure it using import statements, constants, functions, and a main execution block.

You add a shebang line (#!/usr/bin/env python3) at the top of your script, and change the file permissions to make it executable with chmod +x script.py.

You use if __name__ == "__main__" to ensure that certain code only runs when you execute the script directly, not when you import it as a module.

You manage dependencies by declaring them within the script using PEP 723 comments, which allows tools like uv to create a suitable environment automatically.

You handle command-line arguments in Python scripts using libraries like Click, which allow you to define options and commands with decorators for flexible and user-friendly interfaces.

Take the Quiz: Test your knowledge with our interactive “How Can You Structure Your Python Script?” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

How Can You Structure Your Python Script?

What Do You Think?

Rate this article:

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal.

Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session. Happy Pythoning!

How Can You Structure Your Python Script?

Setting the Stage for Scripting

Using the Shebang Line

Organizing the Import Statements

Creating the Initial Script

Adding Structure With Constants and Entry Points

Managing Dependencies With PEP 723

Handling Command-Line Arguments

Choosing Appropriate Data Structures for Scripts

Using enum for Constants and Options

Using dataclass for Flexible Records

Considering Other Structures

Improving Script Feedback

Adding Internal Checks With assert

Enhancing Output With Rich

Following Python Script Structure Recommendations

Putting It All Together

Conclusion

Frequently Asked Questions

Using `enum` for Constants and Options

Using `dataclass` for Flexible Records

Adding Internal Checks With `assert`