AI-Powered Rewrite from Python to Rust - part 3

In the previous part, we restructured our tests into contract tests. Both TestLogParserPython and TestLogParserRust now inherit from the same LogParserContract class. The Python tests pass. The Rust tests fail because there's no real implementation yet. Time to change that.

This article is part 3 of the series When Python Hits the Wall: AI-Powered Rewrite from Python to Rust

Part 1

Part 2

Part 3

You can find the repository with all examples here

Prerequisites

Before we start, we need to have the right tools installed on our machine.

Rust toolchain

First, we need the Rust compiler and Cargo (Rust's package manager). The easiest way to install them is via rustup:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

After installation, verify it works:

rustc --version
cargo --version

PyO3 and maturin

We'll use two key tools to connect Rust and Python:

PyO3 is a Rust library that provides bindings between Rust and Python. It lets us write Rust functions that can be called directly from Python - no subprocess overhead, no serialization. It's the same approach that Pydantic uses under the hood.

maturin is the build tool that compiles our Rust code into a native Python extension module (a .so file on Linux/macOS, .pyd on Windows). It handles all the complexity of building Rust code and packaging it so Python can import it directly. We'll add maturin as a dev dependency, so there's nothing to install globally.

The prompt

With prerequisites in place, we can prompt our AI agent. I'm using Claude Code in auto-accept-edits mode and provide the following prompt:

  Implement the log parser in Rust as a PyO3 extension module so the TestLogParserRust contract tests pass. Specifically:

  1. Create Cargo.toml at the project root for a cdylib crate named _rust_log_parser with pyo3 and regex as dependencies.
  2. Create src/lib.rs that exposes a single parse_file(path: str) -> list[dict] function. Each dict should have keys timestamp (a datetime.datetime or None), level (str or None), fields (dict), and raw (str). Port all parsing logic from log_parser.py —
  timestamps, levels, field types (strings, ints, floats, booleans, null, nested objects), noise line filtering, and escaped quotes.
  3. Update rust_log_parser.py to import _rust_log_parser and convert the returned dicts into LogEntry instances.
  4. Add maturin as the build backend in pyproject.toml ([build-system] with requires = ["maturin>=1.0,<2.0"] and build-backend = "maturin"). Also, add a [tool.maturin] section with features = ["pyo3/extension-module"].
  5. Build with uv run maturin develop and run uv run pytest test_log_parser.py -v to verify all 74 tests pass (both TestLogParserPython and TestLogParserRust).

The prompt is specific about the structure: what files to create, what function signature to expose, and how to wire things up. This is important because the AI needs to know how the Rust side should talk to Python.

Let's walk through what the AI produces.

Cargo.toml

This is the Rust project configuration:

[package]
name = "rust-log-parser"
version = "0.1.0"
edition = "2021"

[lib]
name = "_rust_log_parser"
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.28", features = ["extension-module"] }
regex = "1"

A few things to note:

crate-type = ["cdylib"] tells Rust to compile into a C-compatible dynamic library - this is what Python imports as a native module.
name = "_rust_log_parser" must match the module name we import in Python.
pyo3 with the extension-module feature provides the Python bindings.
regex handles the timestamp and level pattern matching, just like re in the Python version.

src/lib.rs

This is the Rust implementation. It's a direct port of the Python LogParser:

use pyo3::prelude::*;
use pyo3::types::{PyDict, PyList};
use regex::Regex;
use std::fs;
use std::sync::LazyLock;

static TIMESTAMP_PATTERNS: LazyLock<Vec<(Regex, &'static str)>> = LazyLock::new(|| {
    vec![
        (
            Regex::new(r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z").unwrap(),
            "%Y-%m-%dT%H:%M:%S.%fZ",
        ),
        (
            Regex::new(r"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z").unwrap(),
            "%Y-%m-%dT%H:%M:%SZ",
        ),
        (
            Regex::new(r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}").unwrap(),
            "%Y-%m-%d %H:%M:%S",
        ),
        (
            Regex::new(r"\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}").unwrap(),
            "%Y/%m/%d %H:%M:%S",
        ),
    ]
});

static LEVEL_PATTERN: LazyLock<Regex> =
    LazyLock::new(|| Regex::new(r"\[?(INFO|ERROR|WARN|DEBUG|TRACE|FATAL)\]?").unwrap());

The patterns are identical to the Python version. LazyLock ensures the regexes are compiled once on first use and reused across calls - similar to defining compiled regexes at the module level in Python.

The main function exposed to Python is parse_file:

#[pyfunction]
fn parse_file<'py>(py: Python<'py>, path: &str) -> PyResult<Bound<'py, PyList>> {
    let content = fs::read_to_string(path).map_err(|e| {
        if e.kind() == std::io::ErrorKind::NotFound {
            pyo3::exceptions::PyFileNotFoundError::new_err(format!(
                "No such file or directory: '{}'",
                path
            ))
        } else {
            pyo3::exceptions::PyOSError::new_err(e.to_string())
        }
    })?;

    let results = PyList::empty(py);

    for raw_line in content.lines() {
        if is_noise_line(raw_line) {
            continue;
        }

        let (ts_result, remaining) = parse_timestamp(raw_line);
        let (level, remaining) = parse_level(&remaining);

        if ts_result.is_none() && level.is_none() {
            continue;
        }

        let fields = parse_fields(py, &remaining)?;

        let entry = PyDict::new(py);

        match ts_result {
            Some(ts) => {
                let datetime_mod = py.import("datetime")?;
                let datetime_cls = datetime_mod.getattr("datetime")?;
                let py_ts = datetime_cls.call1((
                    ts.year,
                    ts.month,
                    ts.day,
                    ts.hour,
                    ts.minute,
                    ts.second,
                    ts.microsecond,
                ))?;
                entry.set_item("timestamp", py_ts)?;
            }
            None => {
                entry.set_item("timestamp", py.None())?;
            }
        }

        match &level {
            Some(l) => entry.set_item("level", l.as_str())?,
            None => entry.set_item("level", py.None())?,
        }

        entry.set_item("fields", &fields)?;
        entry.set_item("raw", raw_line)?;

        results.append(entry)?;
    }

    Ok(results)
}

#[pymodule]
fn _rust_log_parser(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(parse_file, m)?)?;
    Ok(())
}

Notice a few important details:

#[pyfunction] and #[pymodule] are PyO3 macros that make Rust functions callable from Python.
The function returns a PyList of PyDict - Python-native types. This is what makes the interop seamless.
FileNotFoundError is raised explicitly to match the Python implementation's behavior. This is critical because our contract tests check for it.
Timestamps are constructed using Python's own datetime.datetime via py.import("datetime"). This ensures exact type compatibility with the Python implementation.

The most interesting helper is parse_fields, which handles key=value pairs including nested objects and type coercion:

fn parse_fields<'py>(py: Python<'py>, line: &str) -> PyResult<Bound<'py, PyDict>> {
    let dict = PyDict::new(py);
    let line = line.trim();
    let chars: Vec<char> = line.chars().collect();
    let len = chars.len();
    let mut i = 0;

    while i < len {
        if chars[i] == ' ' || chars[i] == '\t' {
            i += 1;
            continue;
        }

        let eq_pos = match line[i..].find('=') {
            Some(pos) => i + pos,
            None => break,
        };

        let key: String = chars[i..eq_pos].iter().collect();
        let key = key.trim();

        if key.contains(' ') {
            let space_pos = line[i..].find(' ');
            match space_pos {
                Some(pos) => {
                    i = i + pos + 1;
                    continue;
                }
                None => break,
            }
        }

        i = eq_pos + 1;

        if i < len && chars[i] == '{' {
            let start = i;
            let mut brace_count = 1;
            i += 1;
            while i < len && brace_count > 0 {
                if chars[i] == '{' {
                    brace_count += 1;
                } else if chars[i] == '}' {
                    brace_count -= 1;
                }
                i += 1;
            }
            let nested_str: String = chars[start..i].iter().collect();
            let nested_val = parse_nested(py, &nested_str)?;
            dict.set_item(key, nested_val)?;
        } else if i < len && chars[i] == '"' {
            i += 1;
            let start = i;
            while i < len && chars[i] != '"' {
                if chars[i] == '\\' && i + 1 < len {
                    i += 2;
                } else {
                    i += 1;
                }
            }
            let value: String = chars[start..i].iter().collect();
            dict.set_item(key, value)?;
            if i < len {
                i += 1;
            }
        } else {
            let start = i;
            while i < len && chars[i] != ' ' && chars[i] != '\t' {
                i += 1;
            }
            let value: String = chars[start..i].iter().collect();
            if value.is_empty() {
                dict.set_item(key, py.None())?;
            } else {
                set_coerced_value(&dict, key, &value)?;
            }
        }
    }

    Ok(dict)
}

The function walks through the remaining text character by character, branching on what follows the =:

{ → parse a nested object recursively
" → extract a quoted string, handling escaped quotes
anything else → extract an unquoted value and coerce its type (booleans, ints, floats, or strings)

This mirrors the Python implementation's _parse_fields method, but uses explicit index tracking instead of regex splits. The full source, including all helpers, is available in the repository.

Updated wrapper

The RustLogParser wrapper is now a thin layer that converts raw dicts from Rust into LogEntry instances:

import _rust_log_parser
from log_parser import LogEntry


class RustLogParser:

    def load(self, path: str) -> list[LogEntry]:
        raw_entries = _rust_log_parser.parse_file(path)
        return [
            LogEntry(
                timestamp=entry["timestamp"],
                level=entry["level"],
                fields=dict(entry["fields"]),
                raw=entry["raw"],
            )
            for entry in raw_entries
        ]

This is the wrapper pattern at work. Rust returns dicts (cheap to construct via PyO3), and the thin Python wrapper converts them into LogEntry dataclass instances. This way, both LogParser and RustLogParser expose the exact same public interface.

pyproject.toml

Finally, we need to tell Python how to build the Rust extension:

[build-system]
requires = ["maturin>=1.0,<2.0"]
build-backend = "maturin"

[project]
name = "python-to-rust"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = []

[dependency-groups]
dev = [
    "pytest>=9.0.2",
    "maturin>=1.0,<2.0",
]

[tool.maturin]
features = ["pyo3/extension-module"]

The key changes are the [build-system] section (maturin as the build backend) and [tool.maturin] section (enabling the PyO3 extension module feature).

Build and test

Now let's compile and run the tests:

uv run maturin develop

This compiles the Rust code and installs the resulting extension module into our virtualenv. After that, we run the tests:

uv run pytest test_log_parser.py -v

All 74 tests pass - 37 for TestLogParserPython and 37 for TestLogParserRust. Both implementations are running the exact same contract tests. The Rust implementation is behaviorally equivalent to the Python one.

This is exactly the confidence we've been building toward since part 1. We started with a single test, expanded to a comprehensive test suite, restructured into contract tests, and now both implementations are verified against the same spec.

Conclusion

In this article, we completed the rewrite. The AI agent ported the parsing logic from Python to Rust, configured the build, and updated the wrapper. The contract tests from part 2 proved their value - they gave us immediate, automated feedback that the Rust implementation matches the Python one.

The approach we followed throughout the series:

Part 1: Expanded test coverage so we actually know what behavior to preserve
Part 2: Restructured tests into contracts so both implementations share the same spec
Part 3: Implemented the Rust version and verified it against that spec

Without comprehensive, shared tests, we'd be reviewing Python and Rust code side by side, trying to spot differences. With contract tests, we just run pytest and get a definitive answer.

Hint: There's a subtle bug hidden somewhere in the Rust implementation. Try to find it. We'll show how to find it using property-based testing in the last part of this series.

Until then, happy engineering!

All the techniques and approaches that we're using in this series are explained in detail inside Complete Python Testing Guide. That includes real-world examples such as CRM and email integration, together with ready-to-use AI agent instruction files that will help you generate high-value tests with AI agents.

Become a better engineer, one article at a time.

Stay Updated