Skip to content Skip to sidebar

/ Blog

Use Copilot for free Contact sales

Categories

AI & ML
- AI & ML
  Learn about artificial intelligence and machine learning across the GitHub ecosystem and the wider industry.
  - Generative AI
    Learn how to build with generative AI.
  - GitHub Copilot
    Change how you work with GitHub Copilot.
  - LLMs
    Everything developers need to know about LLMs.
  - Machine learning
    Machine learning tips, tricks, and best practices.
- How AI code generation works
  Explore the capabilities and benefits of AI code generation and how it can improve your developer experience.
  Learn more
Developer skills
- Developer skills
  Resources for developers to grow in their skills and careers.
  - Application development
    Insights and best practices for building apps.
  - Career growth
    Tips & tricks to grow as a professional developer.
  - GitHub
    Improve how you use GitHub at work.
  - GitHub Education
    Learn how to move into your first professional role.
  - Programming languages & frameworks
    Stay current on what’s new (or new again).
- Get started with GitHub documentation
  Learn how to start building, shipping, and maintaining software with GitHub.
  Learn more
Engineering
- Engineering
  Get an inside look at how we’re building the home for all developers.
  - Architecture & optimization
    Discover how we deliver a performant and highly available experience across the GitHub platform.
  - Engineering principles
    Explore best practices for building software at scale with a majority remote team.
  - Infrastructure
    Get a glimpse at the technology underlying the world’s leading AI-powered developer platform.
  - Platform security
    Learn how we build security into everything we do across the developer lifecycle.
  - User experience
    Find out what goes into making GitHub the home for all developers.
- How we use GitHub to be more productive, collaborative, and secure
  Our engineering and security teams do some incredible work. Let’s take a look at how we use GitHub to be more productive, build collaboratively, and shift security left.
  Learn more
Enterprise software
- Enterprise software
  Explore how to write, build, and deploy enterprise software at scale.
  - Automation
    Automating your way to faster and more secure ships.
  - CI/CD
    Guides on continuous integration and delivery.
  - Collaboration
    Tips, tools, and tricks to improve developer collaboration.
  - DevOps
    DevOps resources for enterprise engineering teams.
  - DevSecOps
    How to integrate security into the SDLC.
  - Governance & compliance
    Ensuring your builds stay clean.
- How enterprise engineering teams can successfully adopt AI
  Learn how to bring AI to your engineering teams and maximize the value that you get from it.
  Learn more
News & insights
- News & insights
  Keep up with what’s new and notable from inside GitHub.
  - Company news
    An inside look at news and product updates from GitHub.
  - Product
    The latest on GitHub’s platform, products, and tools.
  - Octoverse
    Insights into the state of open source on GitHub.
  - Policy
    The latest policy and regulatory changes in software.
  - Research
    Data-driven insights around the developer ecosystem.
  - The library
    Older news and updates from GitHub.
- Unlocking the power of unstructured data with RAG
  Learn how to use retrieval-augmented generation (RAG) to capture more insights.
  Learn more
Open Source
- Open Source
  Everything open source on GitHub.
  - Git
    The latest Git updates.
  - Maintainers
    Spotlighting open source maintainers.
  - Social impact
    How open source is driving positive change.
  - Gaming
    Explore open source games on GitHub.
- An introduction to innersource
  Organizations worldwide are incorporating open source methodologies into the way they build and ship their own software.
  Learn more
Security
- Security
  Stay up to date on everything security.
  - Application security
    Application security, explained.
  - Supply chain security
    Demystifying supply chain security.
  - Vulnerability research
    Updates from the GitHub Security Lab.
  - Web application security
    Helpful tips on securing web applications.
- The enterprise guide to AI-powered DevSecOps
  Learn about core challenges in DevSecOps, and how you can start addressing them with AI and automation.
  Learn more

Contact sales Use Copilot for free

Back to changelog

Improvement

June 6, 2025 • 1 minute read

You can now run model evaluations with the Models CLI

You can now run prompt evaluations from the command line using the new gh models eval command. This evaluates prompts defined in a .prompt.yml file using the same built-in evaluators available in the GitHub Models UI, including string match, similarity to expected outputs, custom LLM-as-a-judge evaluators, and more.

This makes it easier to test model quality early and often, right from your terminal or CI workflow.

gh models eval my_prompt.prompt.yml

You’ll get a summary of test results for each case, including model output and evaluation scores.

For programmatic use, you can output results in JSON format:

gh models eval my_prompt.prompt.yml --json

The JSON output includes detailed test results, evaluation scores, and summary statistics that can be processed by other tools or CI/CD pipelines.

This new release also improves compatibility with the existing GitHub actions integration for models, making automated evaluations simpler to run as part of your actions workflow. For example, you can run evaluations automatically in actions whenever your .prompt.yml file changes:

Start building AI apps with GitHub Models today

GitHub Models and all our AI development tooling are available now to all GitHub users in public preview. This includes prompt editing and lightweight evaluations. Try our tools out by enabling them in your repository or organization, or learn more in our documentation.

Help us shape what’s next

The Models CLI is open source on GitHub. Check out the code, file issues, or contribute!

We’re just getting started, and your feedback helps guide our roadmap. Join the community discussion to share your thoughts and connect with other developers building the future of AI on GitHub.

Image for: Related Posts

Jun.05 Improvement

Fluency and coherence evaluators added to GitHub Models

ecosystem & accessibility

Jun.05 Improvement

Control contrast for all GitHub themes

ecosystem & accessibility

Jun.04 Release

DeepSeek-R1-0528 is now generally available in GitHub Models

ecosystem & accessibility

May.30 Improvement

JSON schema support in GitHub Models AI developer tooling

ecosystem & accessibility

May.21 Release

Grok 3 Mini is now available in public preview in GitHub Models

ecosystem & accessibility

May.19 Release

GitHub Models built into your repository is in public preview

ecosystem & accessibility

May.19 Release

Grok 3 is now available in public preview in GitHub Models

ecosystem & accessibility

May.15 Release

GitHub Models API now available

ecosystem & accessibility

May.15 Retired

models:read now required for GitHub Models access

ecosystem & accessibility