Claude 3.5 vs. ChatGPT 4o vs. GitHub Copilot

January 22, 2025

We put three leading AI coding assistants through rigorous testing to see which performs best. Claude, GPT-4, and GitHub Copilot face off in practical programming challenges: implementing a Snake game and creating an electron cloud simulation.

Note: This comparison uses earlier versions of these models. Since then, both Claude and ChatGPT have released updated versions with improved capabilities. See our updated comparison with Claude 3.5 Sonnet and ChatGPT o1 for the latest results. We’ve since stopped making these version-specific comparisons because new AI models are released so frequently that the results become outdated within months.

Challenge 1: Snake Game Implementation

Claude 3.5 Performance:
– Created working initial version in first attempt
– Required 6 iterations to fix all issues
– Final issues fixed included:
  – Preventing 180-degree turns
  – Handling rapid key inputs
  – Proper collision detection
– Artifacts feature made version comparison easy

GPT-4 Performance:
– Also created working initial version
– After 8 iterations, still couldn’t fix all issues
– Persistent problems with:
– Key input handling
– Game window not opening
– Less convenient interface for multiple iterations

Challenge 2: Electron Cloud Simulation

Initial Phase:
– Both AIs created basic particle simulations
– GPT-4 produced better physics initially
– Claude included unrequested damping terms

Advanced Features:
– GPT-4 successfully added velocity control
– Failed when adding particle counting feature
– Claude struggled with initial physics
– Successfully implemented advanced features using GPT-4’s base code

Final Results:
– Draw between Claude and GPT-4
– Claude better at iterative improvements
– GPT-4 better at initial implementation

GitHub Copilot Performance

Major Issues:
– Installation compatibility problems with VS Code
– Struggled to read and understand existing code
– Generated incomplete code snippets
– Failed to properly integrate changes
– Unable to fix its own error messages

Limitations:
– Despite running locally, couldn’t utilize context effectively
– Performed worse than expected for a paid tool
– Failed to leverage its theoretical advantages

Final Verdict

Snake Game:
🥇 Claude 3.5
🥈 GPT-4
🥉 GitHub Copilot

Electron Cloud Simulation:
🥇 Tie between Claude 3.5 and GPT-4
🥉 GitHub Copilot

Key Takeaways:
– Claude 3.5 excels at iterative improvements
– GPT-4 strong at initial implementations
– GitHub Copilot underperformed despite theoretical advantages
– Interface matters: Claude’s artifacts feature proved valuable
– All tools still require human oversight and iteration

Why GitHub Copilot Struggled

This surprised me at first. Copilot runs locally in your IDE, so theoretically it should understand your code better. But in practice, it couldn’t leverage that advantage.

The problem seems to be that Copilot is optimized for code completion, not for understanding and fixing complex bugs. When you need to iterate on a problem, the chat-based interfaces of Claude and GPT-4 work better.

For more on GitHub Copilot’s strengths and limitations, see our detailed Copilot analysis.

The Importance of Iteration

One thing that became clear: no AI tool generates perfect code on the first try. The real question is how well they handle iteration and bug fixes.

Claude’s artifacts feature made this much easier – you can see previous versions side-by-side and quickly compare what changed. With ChatGPT, you’re scrolling through a long conversation trying to figure out what’s different.

Want to improve your Python programming skills? Check out our courses at Training Scientists for expert-led instruction in scientific computing and simulation.

Share:

More Posts

Best Beginner friendly Python & Jupyter IDE Comparison 2025

JupyterLab Desktop is no longer being developed, and we need to find alternatives that actually work for teaching Python and

Anaconda is No Longer Free! Best Alternative for Python Library & Environment Management 2025

Anaconda now requires paid licenses for most institutions. We need a free alternative that actually works for scientific computing. Why

ChatGPT o1 vs Claude 3.5: Coding Battle

With both Anthropic and OpenAI releasing updated versions of their AI models, it’s time to put them to the test.

Transform 2D Irregular Grid Data to Perfect Visualizations

Can you interpolate from irregular 2D grids to regular 2D grids? Find out how using Python & Scipy.