With both Anthropic and OpenAI releasing updated versions of their AI models, it’s time to put them to the test. We compare Claude 3.5 Sonnet and ChatGPT o1 in two practical coding challenges: implementing a Snake game and creating an electron cloud simulation.
Note: Since publishing this comparison, even newer versions of these models have been released. We’ve stopped making these version-specific comparisons because AI models update so frequently that the results become outdated within months. The key takeaways about iterative development and debugging remain relevant regardless of model version.
Challenge 1: Snake Game Implementation
Claude 3.5 Performance:
Required 7 iterations to achieve a fully working game:
1. Initial implementation – basic game working
2. Fix rapid key input issues
3. Address input ignoring
4. Improve input responsiveness
5. Fix food spawning inside snake
6. Add keyboard shortcuts
7. Final bug fixes
ChatGPT o1 Performance:
Achieved working solution in 4 iterations:
1. Initial implementation
2. Fix direction changes
3. Address key input handling
4. Improve responsiveness
ChatGPT o1 showed significant improvement over the previous GPT-4 version, requiring fewer iterations to reach a working solution.
Challenge 2: Electron Cloud Simulation
Claude 3.5:
– Started with promising initial implementation
– Struggled with boundary conditions
– Added unwanted damping terms
– Particles passing through each other
– Required significant prompting to improve
ChatGPT o1:
– Better initial physics implementation
– Successfully implemented RK4 integration
– Added working particle counter
– Created real-time visualization
– Achieved complete solution independently
The electron cloud simulation revealed a key difference: ChatGPT o1 handled complex physics better and required less hand-holding to implement advanced features.
Key Improvements in New Versions
Claude 3.5:
– Slightly better initial code generation
– New ability to handle NPY files
– Minor improvement in iteration count
– Still struggles with complex physics
ChatGPT o1:
– Longer thinking time (up to 19 seconds)
– Better handling of complex physics
– More complete solutions
– Fewer iterations needed
The longer thinking time in ChatGPT o1 was noticeable. You can see it “reasoning” before generating code. This extra processing seems to pay off in code quality.
Final Verdict
Snake Game:
Winner: ChatGPT o1 (4 iterations vs Claude’s 7)
Electron Cloud:
Winner: ChatGPT o1 (achieved complete solution independently)
Key Takeaways:
– ChatGPT o1 shows significant improvements over previous version
– Claude 3.5 shows modest improvements
– Both still require iterative prompting
– Physics simulations remain challenging for AI
– No AI tool generates perfect code on the first try
What This Means for Your Work
The important lesson isn’t “which model won” – that will change with the next release. The important lesson is that all AI coding tools work through iteration.
Even the best models need multiple rounds of refinement to handle edge cases and bugs. Your job as a programmer is to understand what the code does so you can guide the AI through those iterations effectively.
Testing Methodology
To ensure fair comparison, we used identical prompts for both AIs:
Initial Snake Game:
"Create a Snake game in Python"
Initial Electron Cloud:
"Create a Python simulation for electrons repelling each other within a circular boundary..."
Related Content
For more on AI coding tools and effective development practices:
– Earlier AI Comparison (Claude 3.5 vs GPT-4 vs Copilot)
– GitHub Copilot Deep Dive
– Debunking AI Programming Myths
Want to learn more about using AI tools for coding? Check out our Python courses at Training Scientists, where we teach you how to effectively leverage AI while building strong programming fundamentals.



