Devin AI vs Engine - Compare Software Engineer Tools
Devin and Engine are great AI software engineer tools but each is suitable for different use cases. Engine is a viable alternative to Devin, especially for established teams. This article will help you make a decision based on several important factors.
- Code quality
- Speed
- Security
- Pricing
- Developer experience
Code Quality
At time of writing, Engine Labs is in 4th place on the SWE-bench verified benchmark, within a few percentage points of first position. This means Engine's performance as an AI software engineer is independently recognised as state of the art.
Engine achieves this with state of the art reasoning and agentic behaviour built on custom workflow integrations and a deep understanding of complex multi-repo codebases.
Engine also runs on the latest available models like Claude 3.5 Sonnet and does not use proprietary models or fine-tunes. This means Engine always runs on the latest foundation models which have historically consistently outperformed use-case specific models.
It's unclear how Devin AI performs on independant benchmarks. In an unverified run on SWE-bench full in early 2024, Devin scored 13.86% which was state of the art at the time. Since then, it has not been possible to tell how benchmark performance has improved since Cognition AI does not release verifiable benchmark scores.
Cognition does not clearly disclose what models Devin uses but it's rumoured that Devin is a fine tune of GPT-4o, trained specifically to improve performance on coding. Whilst GPT-4o is a powerful and relatively cheap model, it has been far superseded by more recent models like Claude 3.5 Sonnet and o1. These models easily outperform even a fine-tune of GPT-4o.
Speed
Both tools operate mostly asynchronously so speed should not be a dealbreaker. They will both usually get you a result faster than an average software engineer.
Devin AI occasionally encounters speed bumps when it uses AI to execute basic tasks like making a pull requests, rather than having integrations like these hard coded. In our experience, this causes long loops of trial and error while attempting to work with third party workflow tools.
Completing coding tasks can be reasonably fast - a simple task can take as little as a couple of minutes. However, if Devin goes down a dead end it can get stuck trying and failing to fix its own buggy or broken code. This results in the task running for some time without getting to a good result. It's likely that this is a result of using models without sophisticated reasoning or planning features.
In general use, the UI is snappy and responsive.
Engine is typically as fast as Devin AI for most tasks but gets stuck less often. This is for two reasons. First, Engine has carefully designed integrations with GitHub, GitLab and all popular workflow tools. This means that, for example, the AI does not have to try (and sometimes fail) to make a GitHub pull request. Second, Engine uses models that offer planning and reasoning out of the box.
Security
In our view, a basic part of keeping customer's codebases private and safe is not to use their code and inputs to train models. Cognition's terms explicitly allow them to do this with your code unless you opt out.
Engine never uses your data, code, or inputs to train any models. Since we use third party models, we have no need to do this. All of our third party model APIs prevents your data being used in this way. You can take this further, by using Engine with models deployed inside your own Amazon VPC (Devin AI also offers this).
Additionally, Devin was recently involved in a public security breach. A streamer was using Devin AI on livestream and exposed the URL of the container on which Devin was running. This URL was unauthenticated and publicly accessible, allowing anyone to access the project and its code.
Pricing
Devin is priced at $500 per month. This includes an amount of monthly usage based on compute and AI which translates to approximately 60 hours of work. Some tasks can take several hours and there are examples of Devin taking over half an hour to submit a pull request. However, it appears that a typical task tasks around 15-30 minutes end-to-end.
Devin AI is competitively priced. This is likely due to the use of a relatively cheap fine-tuned model under the hood. Take advantage of this for tinkering or simple engineering tasks.
Engine is priced for engineering teams that use professional workflows and runs on frontier foundation models. Pricing is on a per task basis (one task is one PR) so you know what your usage will be before you start work.
All Engine packages include implementation and always available support which is why custom pricing applies.
Developer Experience
Each product differs in their experience, catering to a slightly different market. Both run independently and asynchronously in the cloud, freeing you up to work on other things. There are other great options for a more collaborative developer experience. Both are highly configurable and have environments where you can see what the AI is doing and work alongside.
Engine excels when integrating with workflow tools like Jira, Trello, and Linear & version control tools like GitHub and GitLab. A seamless workflow provides the same experience as when working with a junior engineer on your team, albeit with faster turnaround!
Devin only offers a Slack integration, which limits how you can interact with it. It's a more casual interface for users who don't have well defined workflows.
Which is best?
Devin is great for very small teams getting started with AI powered engineering or individuals looking for an AI team mate.
Engine is a more comprehensive offering for small to large professional engineering teams with state of the art performance. Easier integration with custom workflows sets it apart.