Claude 3.5 vs. ChatGPT 4o vs. GitHub Copilot


We put three leading AI coding assistants through rigorous testing to see which performs best. Claude, GPT-4, and GitHub Copilot face off in practical programming challenges: implementing a Snake game and creating an electron cloud simulation.

Challenge 1: Snake Game Implementation

Claude 3.5 Performance:
– Created working initial version in first attempt
– Required 6 iterations to fix all issues
– Final issues fixed included:
  – Preventing 180-degree turns
  – Handling rapid key inputs
  – Proper collision detection
– Artifacts feature made version comparison easy

GPT-4 Performance:
– Also created working initial version
– After 8 iterations, still couldn’t fix all issues
– Persistent problems with:
  – Key input handling
  – Game window not opening
– Less convenient interface for multiple iterations

Challenge 2: Electron Cloud Simulation

Initial Phase:
– Both AIs created basic particle simulations
– GPT-4 produced better physics initially
– Claude included unrequested damping terms

Advanced Features:
– GPT-4 successfully added velocity control
– Failed when adding particle counting feature
– Claude struggled with initial physics
– Successfully implemented advanced features using GPT-4’s base code

Final Results:
– Draw between Claude and GPT-4
– Claude better at iterative improvements
– GPT-4 better at initial implementation

GitHub Copilot Performance

Major Issues:
– Installation compatibility problems with VS Code
– Struggled to read and understand existing code
– Generated incomplete code snippets
– Failed to properly integrate changes
– Unable to fix its own error messages

Limitations:
– Despite running locally, couldn’t utilize context effectively
– Performed worse than expected for a paid tool
– Failed to leverage its theoretical advantages

Final Verdict

Snake Game:
🥇 Claude 3.5
🥈 GPT-4
🥉 GitHub Copilot

Electron Cloud Simulation:
🥇 Tie between Claude 3.5 and GPT-4
🥉 GitHub Copilot

Key Takeaways:
– Claude 3.5 excels at iterative improvements
– GPT-4 strong at initial implementations
– GitHub Copilot underperformed despite theoretical advantages
– Interface matters: Claude’s artifacts feature proved valuable
– All tools still require human oversight and iteration


Want to improve your Python programming skills? Check out our courses at Training Scientists for expert-led instruction in scientific computing and simulation.

Share:

More Posts

Send Us A Message

Scroll to Top