Cerebras Systems’ cover photo
Cerebras Systems

Cerebras Systems

Computer Hardware

Sunnyvale, California 73,670 followers

AI insights, faster! We're a computer systems company dedicated to accelerating deep learning.

About us

Cerebras Systems is accelerating the future of generative AI. We’re a team of pioneering computer architects, deep learning researchers, and engineers building a new class of AI supercomputers from the ground up. Our flagship system, Cerebras CS-3, is powered by the Wafer Scale Engine 3—the world’s largest and fastest AI processor. CS-3s are effortlessly clustered to create the largest AI supercomputers on Earth, while abstracting away the complexity of traditional distributed computing. From sub-second inference speeds to breakthrough training performance, Cerebras makes it easier to build and deploy state-of-the-art AI—from proprietary enterprise models to open-source projects downloaded millions of times. Here’s what makes our platform different: 🔦 Sub-second reasoning – Instant intelligence and real-time responsiveness, even at massive scale ⚡ Blazing-fast inference – Up to 100x performance gains over traditional AI infrastructure 🧠 Agentic AI in action – Models that can plan, act, and adapt autonomously 🌍 Scalable infrastructure – Built to move from prototype to global deployment without friction Cerebras solutions are available in the Cerebras Cloud or on-prem, serving leading enterprises, research labs, and government agencies worldwide. 👉 Learn more: www.cerebras.ai Join us: https://cerebras.net/careers/

Website
http://www.cerebras.ai
Industry
Computer Hardware
Company size
201-500 employees
Headquarters
Sunnyvale, California
Type
Privately Held
Founded
2016
Specialties
artificial intelligence, deep learning, natural language processing, and inference

Products

Locations

Employees at Cerebras Systems

Updates

  • View organization page for Cerebras Systems

    73,670 followers

    *OpenAI gpt-oss-120B runs at 3,000 toks/sec on Cerebras, the world’s fastest AI inference* 🚀 What to know We are excited to announce that OpenAI’s new open model (gpt-oss-120B) is live on Cerebras – a major step forward on our mission to give everyone access to the best open models running at world record speeds! This collaboration delivers the best of GenAI: openness, intelligence, top speed, lower cost, and ease of use – without compromises. ✅ Breaking it down ⚡️ World record ~3,000 tokens / second ⚡️ Reasoning on par with o4-mini, Gemini 2.5 Flash, Claude 4 Opus Easy to use with OpenAI API compatibility, behavior, and quality 131K context length $0.25 | $0.69 per M tokens 👉 Try it today Try chat: https://lnkd.in/gEJJ2pfY Get your free API key: https://lnkd.in/gQR-TaPG Available via Hugging Face, OpenRouter and Vercel

  • View organization page for Cerebras Systems

    73,670 followers

    OpenAI released GPT-OSS-120B, their first open-weight Mixture-of-Experts (MoE) model.💡 Perfect timing, because a crucial part of any MoE model is routing, and Part 2 of our MoE 101 guide breaks down the exact routing strategy OpenAI is using. It’s often the part that looks simple on paper… but breaks in practice. In this post, Daria Soboleva explains: • A full comparison table of routing methods (which do you think OpenAI chose?) • The 3 core routing approaches that power most MoE models today • Implementation tips you won’t find in academic papers With production-scale MoE now in the open, understanding routing isn’t just academic. It’s the key to scaling these models Skip the theory that breaks in prod. Start building smarter 🧠 https://lnkd.in/gH7kizcR #moe #mixtureofexpertsmodel #oss

    • No alternative text description for this image
  • Let's set aside charts and graphs for a minute, and see what blazing fast Cerebras Inference + OpenAI's gpt-oss-120b open-source model look like IRL. Take a look. 🎥 You can use Cerebras' gpt-oss-120b to build realistic speech-to-speech voice interfaces with emotion via Hume AI's new EVI 3. Perfect for your next voice AI project! Try it yourself... Ask it a complex math question mid-conversation, and it’ll solve it while keeping the chat going. https://lnkd.in/gsgZF43f Let’s see what you build.

  • Good things take time. But AI should be instant. OpenAI Thank you for the partnership! 🔶 🚀

    View profile for Andrew Feldman

    Founder and CEO, Cerebras Systems

    In 2016, Sam Altman and I first met. OpenAI was a vision. Cerebras Systems was powerpoint. Sam became one of the early investors in Cerebras Systems. In the following years, Cerebras Systems and OpenAI frequently met to explore working together. But the timing was too early—LLMs hadn’t been invented yet.    Today, the story comes full circle. OpenAI just released its most powerful open weight reasoning model—and it runs fastest on Cerebras Systems. Not a little bit faster than the competition. It smokes the competition. Running on our third generation Wafer Scale Engine, OpenAI gpt-oss-120B runs at up to 3,000 tokens/s – the fastest speed achieved by an OpenAI model in production. Reasoning that take minutes on NVIDIA GPUs take a single second on Cerebras Systems.   Good things take time.

    • No alternative text description for this image
  • We have the models. We have the fastest speed. 💡 You bring the ideas. Let's build: https://lnkd.in/gqHK_d87

    View organization page for Artificial Analysis

    16,165 followers

    Cerebras has been demonstrating its ability to host large MoEs at very high speeds this week, launching Qwen3 235B 2507 and Qwen3 Coder 480B endpoints at >1,500 output tokens/s ➤ Cerebras Systems now offers endpoints for both Qwen3 235B 2507 Reasoning & Non-reasoning. Both models have 235B total parameters with 22B active. ➤ Qwen 3 235B 2507 Reasoning offers intelligence comparable to o4-mini (high) & DeepSeek R1 0528. The Non-reasoning variant offers intelligence comparable to Kimi K2 and well above GPT-4.1 and Llama 4 Maverick. ➤ Qwen3 Coder 480B has 480B total parameters with 35B active. This model is particularly strong for agentic coding and can be used in a variety of coding agent tools, including the Qwen3-Coder CLI. ➤ One of the most impressive views of these large models on Cerebras is the End-to-End response time achievable on Qwen3 235B 2507 (Reasoning): the model can get through input processing, reasoning and output for our standard 1K token test query in <2 seconds. Cerebras’ launches represent the first time this level of intelligence has been accessible at these output speeds and have the potential to unlock new use cases - like using a reasoning model for each step of an agent without having to wait minutes.

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • Cerebras Systems reposted this

    🚀 Qwen3-Coder 480B is now LIVE on Cerebras + 🔥 Announcing Cerebras Code monthly plans Running on Cerebras Code, Qwen3-Coder screams at 2K tokens / sec (20x faster than GPU) with a 131K token context, no proprietary IDE lock-in, and no weekly limits! 👀 TLDR •Frontier coding accuracy on par with Claude 4 Sonnet per SWE bench •20x faster at 33% lower cost as compared to Claude 4 Sonnet •Open weight freedom – full control and abilitiy toplug into any OpenAI-compatible IDE (Cursor, Continue.dev, Cline, RooCode… you name it) •Easily accessible through two new monthly plans with no weekly limits: 👉 Cerebras Code Pro for indie developers ($50/mo) 👉 Cerebras Code Max for power users with 5x rate limits ($200/mo) 👩💻 Get your free API key, drop it into your favorite editor, and ship faster than ever: https://lnkd.in/gMdnXyE9

  • 🚀 Qwen3-Coder 480B is now LIVE on Cerebras + 🔥 Announcing Cerebras Code monthly plans Running on Cerebras Code, Qwen3-Coder screams at 2K tokens / sec (20x faster than GPU) with a 131K token context, no proprietary IDE lock-in, and no weekly limits! 👀 TLDR •Frontier coding accuracy on par with Claude 4 Sonnet per SWE bench •20x faster at 33% lower cost as compared to Claude 4 Sonnet •Open weight freedom – full control and abilitiy toplug into any OpenAI-compatible IDE (Cursor, Continue.dev, Cline, RooCode… you name it) •Easily accessible through two new monthly plans with no weekly limits: 👉 Cerebras Code Pro for indie developers ($50/mo) 👉 Cerebras Code Max for power users with 5x rate limits ($200/mo) 👩💻 Get your free API key, drop it into your favorite editor, and ship faster than ever: https://lnkd.in/gMdnXyE9

  • 🚀 Qwen-3‑235B‑A22B‑Thinking‑2507 is now live! This model is smarter and faster than DeepSeek R1, per Artificial Analysis ✅ Key Facts: - World’s fastest frontier reasoning model - ~1,700 tokens/sec (~26x faster than GPU) - Targeted for coding, agentic workflows, and multilingual use cases - 1.7s time-to-full-answer - 131K context length - Input: $0.60 | Output: $1.2 per M tokens - Open weights for full control 👉 Get your free API key to power your apps: https://lnkd.in/gQR-TaPG 🤗 Head over the Hugging Face and start building with blazing fast thinking: https://lnkd.in/gZP5amNA

    • No alternative text description for this image
  • Qwen 3 235B Instruct on Cerebras runs at an unprecedented 1,400 tokens/s – making it the world’s only frontier model that can generate code, run agents, and carry conversations at instant speed. Don't just take our word for it, get your API key and put it to work: https://lnkd.in/gQR-TaPG

Similar pages

Browse jobs

Funding