AI-Powered Rewrite from Python to Rust - part 2

In the previous part, we improved our LogParser by adding missing tests. This was the first step towards a confident rewrite. Now that we know our code is well tested, we need to set up our test suite to evaluate our Rust implementation using the same tests. This way, we'll ensure that both implementations are passing the same set of tests. This should give us confidence that they are actually doing exactly the same thing - only that Rust is faster.

This article is part 2 of the series When Python Hits the Wall: AI-Powered Rewrite from Python to Rust

Part 1

Part 2

You can find the repository with all examples here

Two objects, same set of tests

At that point, we have tests for our Python implementation. That's great. Anyhow, we have a problem. With tests, as they are, we are testing only the Python implementation. We would like to test both implementations with the same set of tests. We can do that using contract tests. You can think of contract tests as a test template with a single variable - the implementation. In Python, that's easy to do. We just need to define a contract class that implements all the tests and an abstract factory fixture. Then, for each implementation, we create a test class that inherits from the contract.

There's a chapter dedicated to a deep dive into contract tests inside Complete Python Testing Guide

So let's instruct our AI to update the tests:

  Rewrite the tests in test_log_parser.py into the contract test pattern:                                                                                                                                                                                           

  1. Create a LogParserContract base class that holds all the tests as methods. It should define a parser pytest fixture that raises NotImplementedError. Each test method receives the parser via this fixture.                                                    
  2. Create TestLogParserPython(LogParserContract) that overrides the parser fixture to return LogParser(). All existing tests should pass through this class.                                                                                                      
  3. Create a RustLogParser wrapper class in a new file, rust_log_parser.py, with the same public interface as LogParser (load(path) -> list[LogEntry]). For now, it should just return an empty list from load.                                                       
  4. Create TestLogParserRust(LogParserContract) that overrides the parser fixture to return RustLogParser(). These tests will fail because the Rust implementation hasn't been built yet.                                                                                  

  The create_log_file fixture should remain a module-level fixture shared by both test classes. The contract class itself should not be collected by pytest (i.e., it should not be prefixed with Test).

Use the auto-accept-edits mode.

After running the prompt, the AI will refactor tests to something like this:

import os
import tempfile
from datetime import datetime

import pytest

from log_parser import LogParser
from rust_log_parser import RustLogParser


@pytest.fixture
def create_log_file():
    paths = []

    def _create(*lines: str) -> str:
        with tempfile.NamedTemporaryFile(mode="w", suffix=".log", delete=False) as f:
            for line in lines:
                f.write(line + "\n")
            paths.append(f.name)
        return paths[-1]

    yield _create

    for path in paths:
        os.unlink(path)


class LogParserContract:
    @pytest.fixture
    def parser(self):
        raise NotImplementedError

    # --- Timestamp formats (parametrized) ---

    @pytest.mark.parametrize(
        "line, expected_ts",
        [
            (
                "2024-01-15T10:23:45.123Z [INFO] service=auth",
                datetime(2024, 1, 15, 10, 23, 45, 123000),
            ),
            (
                "2024-01-15T10:23:45Z [INFO] service=auth",
                datetime(2024, 1, 15, 10, 23, 45),
            ),
            (
                "2024-01-15 10:23:45 [INFO] service=auth",
                datetime(2024, 1, 15, 10, 23, 45),
            ),
            (
                "2024/01/15 10:23:45 [INFO] service=auth",
                datetime(2024, 1, 15, 10, 23, 45),
            ),
        ],
        ids=[
            "iso8601_fractional",
            "iso8601_no_fractional",
            "datetime_dashes",
            "datetime_slashes",
        ],
    )
    def test_timestamp_formats(self, parser, create_log_file, line, expected_ts):
        path = create_log_file(line)
        entries = parser.load(path)
        assert len(entries) == 1
        assert entries[0].timestamp == expected_ts

    # --- All log levels ---

    @pytest.mark.parametrize("level", ["INFO", "ERROR", "WARN", "DEBUG", "TRACE", "FATAL"])
    def test_all_levels(self, parser, create_log_file, level):
        path = create_log_file(f"2024-01-15T10:23:45Z [{level}] service=app")
        entries = parser.load(path)
        assert len(entries) == 1
        assert entries[0].level == level

    @pytest.mark.parametrize("level", ["INFO", "ERROR", "WARN", "DEBUG", "TRACE", "FATAL"])
    def test_level_without_brackets(self, parser, create_log_file, level):
        path = create_log_file(f"2024-01-15T10:23:45Z {level} service=app")
        entries = parser.load(path)
        assert entries[0].level == level

    # --- Nested field parsing ---

    def test_nested_fields(self, parser, create_log_file):
        path = create_log_file(
            '2024-01-15T10:23:45Z [ERROR] details={host="ldap-1.internal",port=636,ssl=true}'
        )
        entries = parser.load(path)
        assert len(entries) == 1
        details = entries[0].fields["details"]
        assert details["host"] == "ldap-1.internal"
        assert details["port"] == 636
        assert details["ssl"] is True

    def test_nested_fields_with_false(self, parser, create_log_file):
        path = create_log_file(
            "2024-01-15T10:23:45Z [INFO] config={debug=false,retries=3}"
        )
        entries = parser.load(path)
        config = entries[0].fields["config"]
        assert config["debug"] is False
        assert config["retries"] == 3

    def test_nested_fields_with_float(self, parser, create_log_file):
        path = create_log_file(
            "2024-01-15T10:23:45Z [INFO] stats={avg=12.5,count=100}"
        )
        entries = parser.load(path)
        stats = entries[0].fields["stats"]
        assert stats["avg"] == 12.5
        assert stats["count"] == 100

    # --- Noise lines ignored ---

    def test_empty_lines_ignored(self, parser, create_log_file):
        path = create_log_file(
            "2024-01-15T10:23:45Z [INFO] service=auth",
            "",
            "   ",
            "2024-01-15T10:23:46Z [ERROR] service=payment",
        )
        entries = parser.load(path)
        assert len(entries) == 2

    def test_separator_lines_ignored(self, parser, create_log_file):
        path = create_log_file(
            "2024-01-15T10:23:45Z [INFO] service=auth",
            "-- system restart at 2024-01-15T10:24:00Z --",
            "2024-01-15T10:23:46Z [ERROR] service=payment",
        )
        entries = parser.load(path)
        assert len(entries) == 2

    def test_line_without_timestamp_or_level_ignored(self, parser, create_log_file):
        path = create_log_file("just some random text with no structure")
        entries = parser.load(path)
        assert len(entries) == 0

    # --- Field types ---

    def test_string_fields(self, parser, create_log_file):
        path = create_log_file(
            '2024-01-15T10:23:45Z [INFO] msg="hello world" service=auth'
        )
        entries = parser.load(path)
        assert entries[0].fields["msg"] == "hello world"
        assert entries[0].fields["service"] == "auth"

    def test_quoted_string_with_escaped_quotes(self, parser, create_log_file):
        path = create_log_file(
            r'2024-01-15T10:23:45Z [ERROR] error="failed to parse \"config.json\"" service=app'
        )
        entries = parser.load(path)
        assert "config.json" in entries[0].fields["error"]

    def test_integer_field(self, parser, create_log_file):
        path = create_log_file("2024-01-15T10:23:45Z [INFO] duration_ms=150 user_id=42")
        entries = parser.load(path)
        assert entries[0].fields["duration_ms"] == 150

    def test_float_field(self, parser, create_log_file):
        path = create_log_file("2024-01-15T10:23:45Z [INFO] amount=99.99")
        entries = parser.load(path)
        assert entries[0].fields["amount"] == 99.99

    def test_boolean_fields(self, parser, create_log_file):
        path = create_log_file("2024-01-15T10:23:45Z [INFO] success=true failed=false")
        entries = parser.load(path)
        assert entries[0].fields["success"] is True
        assert entries[0].fields["failed"] is False

    def test_empty_value_field(self, parser, create_log_file):
        path = create_log_file("2024-01-15T10:23:45Z [INFO] user_id= service=auth")
        entries = parser.load(path)
        assert entries[0].fields["user_id"] is None

    def test_mixed_field_types_in_one_line(self, parser, create_log_file):
        path = create_log_file(
            '2024-01-15T10:23:45Z [INFO] service=auth user_id=42 amount=9.99 success=true msg="ok"'
        )
        entries = parser.load(path)
        fields = entries[0].fields
        assert fields["service"] == "auth"
        assert fields["user_id"] == 42
        assert fields["amount"] == 9.99
        assert fields["success"] is True
        assert fields["msg"] == "ok"

    # --- Multiline files ---

    def test_multiline_file(self, parser, create_log_file):
        path = create_log_file(
            "2024-01-15T10:23:45.123Z [INFO] service=auth user_id=42 action=login duration_ms=150 status=success",
            '[ERROR] 2024-01-15T10:23:45.456Z service=payment action=charge amount=99.99 error="timeout"',
            "2024-01-15 10:23:46 WARN service=auth retry_count=3",
            '2024-01-15T10:23:47.001Z [DEBUG] service=gateway msg="Health check passed"',
            "",
            "-- separator --",
            "2024/01/15 10:24:01 [INFO] service=gateway action=startup",
        )
        entries = parser.load(path)
        assert len(entries) == 5
        assert entries[0].level == "INFO"
        assert entries[1].level == "ERROR"
        assert entries[2].level == "WARN"
        assert entries[3].level == "DEBUG"
        assert entries[4].level == "INFO"

    def test_timestamp_after_level(self, parser, create_log_file):
        path = create_log_file(
            "[ERROR] 2024-01-15T10:23:45.456Z service=payment"
        )
        entries = parser.load(path)
        assert len(entries) == 1
        assert entries[0].level == "ERROR"
        assert entries[0].timestamp == datetime(2024, 1, 15, 10, 23, 45, 456000)

    def test_raw_line_preserved(self, parser, create_log_file):
        line = "2024-01-15T10:23:45Z [INFO] service=auth"
        path = create_log_file(line)
        entries = parser.load(path)
        assert entries[0].raw == line

    # --- Edge cases ---

    def test_load_nonexistent_file(self, parser):
        with pytest.raises(FileNotFoundError):
            parser.load("/nonexistent/path/to/file.log")

    def test_level_only_no_timestamp(self, parser, create_log_file):
        path = create_log_file("[INFO] service=auth action=login")
        entries = parser.load(path)
        assert len(entries) == 1
        assert entries[0].level == "INFO"
        assert entries[0].timestamp is None
        assert entries[0].fields["service"] == "auth"

    def test_level_only_no_fields(self, parser, create_log_file):
        path = create_log_file("2024-01-15T10:23:45Z [INFO]")
        entries = parser.load(path)
        assert len(entries) == 1
        assert entries[0].level == "INFO"
        assert entries[0].fields == {}

    def test_nested_field_with_comma_in_quoted_value(self, parser, create_log_file):
        path = create_log_file(
            '2024-01-15T10:23:45Z [INFO] ctx={msg="hello, world",count=1}'
        )
        entries = parser.load(path)
        ctx = entries[0].fields["ctx"]
        assert ctx["msg"] == "hello, world"
        assert ctx["count"] == 1

    def test_fields_with_spaces_in_key_skipped(self, parser, create_log_file):
        path = create_log_file(
            "2024-01-15T10:23:45Z [INFO] some garbage key=value"
        )
        entries = parser.load(path)
        assert entries[0].fields["key"] == "value"
        assert "some garbage" not in entries[0].fields


class TestLogParserPython(LogParserContract):
    @pytest.fixture
    def parser(self):
        return LogParser()


class TestLogParserRust(LogParserContract):
    @pytest.fixture
    def parser(self):
        return RustLogParser()

So, now we have exactly the same set of tests as before - they haven't changed. It's just that they are written using the contract test pattern. There's also a dummy stub for RustLogParser that intentionally returns an empty list so the Rust tests fail - confirming the contract works:

from log_parser import LogEntry


class RustLogParser:

    def load(self, path: str) -> list[LogEntry]:
        return []

If we run the tests now, the ones for the Python implementation will pass - as before. The ones for the Rust implementation will fail, as no work has been done yet. And that's exactly the place where we want to be before starting the Rust implementation.

Conclusion

In this article, we looked at contract tests. We used them to ensure that each object of interest passes exactly the same set of tests. This will give us confidence when writing a Rust implementation, as the AI agent can run tests during development to ensure things work as expected. Next time, we'll actually implement the same behavior in Rust.

Until then, happy engineering!

Two objects, same set of tests

Conclusion

Become a better engineer, one article at a time.

Stay Updated