Why I Choose Python Over n8n

I do not choose Python because n8n is bad.

That would be too easy, and frankly not true.

n8n has a very real charm. The first time you watch a workflow appear on a canvas, it feels like automation has finally become visible in the best nicest way. For document-heavy industries, that visibility certanly is not a small thing. Finance teams, legal operations, back offices, insurance teams, compliance teams: they do not want to hear about pipelines, dependencies, retry logic, or vector indexes. They want to actually see where the invoice enters, where the contract is read, where the extracted fields are checked, and where the final result lands.

That is why I still get genuinely excited when I see n8n used well. Practical examples show n8n being useful for invoice processing, OCR, AI parsing, validation, Google Sheets updates, and ready-made OCR → AI parsing → validation workflows. GrowwStacks describes reducing 120 invoices per month from 20 hours to just 3 hours! Official n8n templates include an AI invoice agent and an OCR, AI, and Google Sheets invoice processor. I would be dishonest if I pretended those examples did not matter. They show exactly why n8n became attractive in document automation in the first place.

But my disappointment starts at the same point my excitement does: the moment the beautiful workflow stops being a demo and starts becoming a system.

In document-heavy industries, the document is never “JUST a file.” It is a PDF with odd encoding. It is a scanned invoice with missing fields. It is a contract with 80 pages and one clause that matters. It is a base64 blob coming from one system, binary expected by another, and a memory spike sitting quietly in between. It is also not one document. It is a thousand today, ten thousand next month, and some unlucky batch on a Friday evening that exposes every hidden assumption in the workflow. In a word, it is quite often a messy hell.

This post on Reddit is painfully familiar in spirit. The developer says n8n was not great at processing, moving, or transforming large or multiple files; built-in nodes fell short; workarounds became hacky or unreliable. The most concrete detail is the base64-to-binary workflow that kept losing connection because of memory issues. Recreated in Python, the same workflow had no more memory issues.

That is the sort of story that makes me stop romanticising the canvas.

One can have a patience for imperfect tools. But one has very little patience for unreliable document handling. When a workflow is processing invoices, claim forms, procurement documents, or internal compliance files, “MOSTLY works” becomes a dangerous phrase. The business user does not care whether the failure came from a node, browser limitation, hidden memory pressure, or an awkward binary conversion. They care that the document did not move, the field was not extracted, or the automation failed without a clean explanation.

Python is not magic here. Python can also be badly written. But Python gives me a cleaner place to be serious. I stream files, inspect memory behaviour, handle binary formats directly, isolate transformations, add structured logging, and test weird cases without dragging logic across a canvas. The relief is not theoretical. It is practical. I know where the file enters. I know where it changes shape. I know what exception was raised. I can reproduce the problem.

That leads to the second point: performance is not just speed; it is trust.

The GrowwStacks comparison is brilliantly described because it has the kind of numbers that cut through opinion. The same AI agent was built twice, in pure Python and in n8n. The n8n version took 7 minutes and 37 seconds to process a day’s worth of Reddit discussions. The Python version completed the same task in just under 3 minutes. Over a month, that difference compounded: more than 3.5 hours of processing in n8n versus about 90 minutes in Python. Moreover, as the dataset grew, n8n struggled with browser-based processing limitations, while Python handled larger volumes without freezing or crashing!

I like this case because it does not say, “Python is cooler.” It says, “The workload grew, and the difference became operational.”

That is exactly how document automation behaves. At first, the workload is charmingly small. A few invoices. A pilot folder. A proof of concept for contract Q&A. A controlled batch of PDFs. Then someone sees the result and asks the fatal question: can we run this on all of them?

That is when I stop caring about how elegant the workflow looks (and, to be honest, it's not that hard these days to replicate almost any visual interface). What I start caring about is how it behaves under volume.

A seven-minute process is not always bad. A three-minute process is not always good. The number itself is not the whole story. What matters is whether the system gets slower predictably, whether it can be scaled, whether it crashes, whether it can be monitored, and whether the team can understand what happened when the data doubles.

n8n can be good at orchestration (still not that robust as Python's frameworks, such as LangGraph). It can be good at connecting services. It can be good at giving teams a shared picture of a process. But when the core job is heavy text processing, embeddings, clustering, large-file processing, or model inference, I put my trust in Python. Zen van Riel in his personal blog makes the same pattern explicit: n8n wins for integrations, visual debugging, and prototyping; Python wins for core AI logic, testing, Git, debugging, and deeper customization.

This is not a minor distinction. It is the whole decision.

In document-heavy industries, I rarely want one giant visual workflow to do everything. I want the workflow to call the thing that does the hard work. That thing, for me, is usually Python.

The third reason is maintenance, and this is where my honest scepticism about n8n becomes sharper.

No-code tools have a strange failure mode: they do not always remove code. Sometimes they hide it badly.

This comparison on GrowwStacks describes an n8n implementation that required more than 200 lines of JavaScript spread across multiple function nodes! Developers had to write those code blocks without proper debugging tools. The article also says that when the logic needed modification six months later, the Python code’s organisation made the change straightforward, while the n8n workflow’s scattered JavaScript blocks would require rediscovering the logic from scratch. They called it the “no-code trap”: complex programming without the benefits of a proper development environment.

That phrase lands hard for me.

I do not mind code. I mind code that pretends it is not code.

If a workflow contains real business logic — validation rules, exception handling, document classification, field correction, confidence thresholds, retry strategy, routing rules — then I want that logic somewhere visible, testable, versioned, reviewable, and searchable. I do not want it scattered across tiny code nodes that look harmless on a canvas but behave like a fragmented application.

This is especially painful in document-heavy companies because the rules are rarely stable. A supplier changes invoice format. A customer asks why one document was accepted and another was rejected. A regulator changes a requirement. A legal team wants a different clause extracted. A finance team adds a tolerance rule. Can a good developer implement these changes in Python? Easily.

Just where Python feels boring in the BEST possible way. A project is structured. Functions have names. Tests describe expectations. Logs tell a story. Git show what changed. A reviewer comments on the exact line where the validation rule became too permissive. That does not sound glamorous, but it is the difference between an automation that can mature and an automation that becomes folklore.

And then AI makes the problem deeper...

Document AI is not just “send PDF to model, receive answer.” That may work for a prototype, and I enjoy prototypes. But production document AI is full of small, consequential decisions: how to split a document, what to embed, what metadata to preserve, how to retrieve, how to rerank, how to validate output, how to handle missing evidence, how to retry, how to decide whether the model is allowed to answer at all.

This is why I pay attention to RAG-specific comparisons. Zen van Riel’s comparison is direct: for custom RAG — document ingestion, embedding, retrieval, custom reranking, streaming — the n8n approach hits walls, ends up with code nodes everywhere, and may as well be written in Python. The same article says that for AI systems where output quality matters, testable Python code is essential, with unit tests, integration tests, mocking, and CI/CD. Saksham Solanki reaches a compatible hybrid conclusion: use n8n for orchestration and LangChain/Python for deep agent logic.

That is exactly where I get excited about Python.

Not because it is more “technical.” That is the least interesting argument. I get excited because it lets the AI system become inspectable. I can test the chunking. I can compare retrieval strategies. I can write a regression test for the supplier invoice that previously broke extraction. I can mock the external OCR service. I can measure how often the system refuses to answer when the evidence is weak. I can keep the messy intelligence of the workflow in a place where engineering discipline still applies.

For document-heavy industries, quality is not decorative. If the system extracts the wrong payment terms, misses a liability clause, misreads a tax number, or routes an exception incorrectly, the automation has not merely “failed.” It has produced work that someone may trust.

The final reason is production confidence.

I found on Reddit this blunt observation: n8n struggles when building truly reliable AI agents that need complex logic, persistent state, and robust error handling. The writer says that beyond basic AI use cases, reliability issues and limitations appear quickly; with Python, the system became more stable, flexible, professional, scalable, and maintainable.

I do not read that as an anti-n8n rant. I read it as a production boundary.

There is a moment in every automation project when the question changes. At first, the question is: “Can we automate this?” Later, it becomes: “Can we trust this when nobody is watching?”

That second question is the one I care about most.

A production document automation system needs to fail cleanly. It needs persistent state. It needs retries that do not create duplicate actions. It needs to remember what it processed. It needs to expose enough logs for investigation. It needs tests before changes. It needs version control. It needs graceful degradation when OCR, an LLM, a database, or an external API behaves badly.

My main observation about n8n: it can make complexity look manageable before it is actually managed.

Python does not guarantee good architecture by default - you need to build it. It does not force good quality - you need to engineer it (and - to be honest - you can do this extremely good and fast these days, with AI agents).

And that is my main excitement with Python: it does not hide the complexity. It gives me a place to work with it honestly.

So my position is not “never use n8n.” That would be lazy.

Use n8n when the problem is mostly orchestration, you have small team, predictable document workflow and you want fast no-code solution for prototype.

But when the work becomes document-heavy in the serious sense — large files, messy transformations, growing batches, extraction quality, persistent agent state, long-term maintenance, and all proccesses and outputs shall align with the company's knowledge- and databases — I choose Python.

I choose it not because I enjoy writing more code, but because I dislike pretending that complex systems are simpler than they are.

Implementation audit

Choose the right implementation layer before building

Read guide