Zeus
- 1 Introduction
- 2 Performance Improvements
- 2.1 Content Creation
- 2.1.1 10x Path Tracing
- 2.1.2 Glowstick
- 2.1.3 Materials & Textures
- 2.2 Content Consumption
- 2.2.1 Real-Time Path Tracing
- 2.2.2 Game Physics
- 2.3 HPC
- 2.1 Content Creation
- 3 Expandable Memory
- 4 Built-In Ethernet
- 5 Zeus Cluster
- 5.1 High Performance CPU Core
- 5.2 FP64 Vector Cores
- 5.3 Accelerators
- 5.4 Video Encoding & Decoding
- 5.5 Matrix Cores
- 5.6 Router
- 5.7 Telemetry
- 6 Zeus Chiplet
- 7 Zeus PCIe Card
- 8 Zeus Server
- 9 Zeus Rack
- 10 Software
- 10.1 Core Software
- 10.1.1 Operating Systems
- 10.2 Porting & Optimizing
- 10.3 Telemetry & Debugging
- 10.1 Core Software
Introduction
Zeus is a next-generation graphics processor with a unique design that combines scalar (CPU) cores, vector (GPU) cores, and other specialized processors. In designing Zeus, we took a first principles approach, examining performance, efficiency, and scalability limitations of legacy GPU architectures. The result is Zeus: a GPU that outperforms legacy GPUs in key workloads, eliminates scaling issues, and drives down power consumption significantly.
Zeus brings a host of firsts to modern consumer-grade GPUs:
Native high-speed Ethernet (up to 800 Gbps interfaces)
Expandable memory using industry-standard SO-DIMMs and DIMMs
Ability to run Linux directly on the GPU
Out of Band management & telemetry using Redfish
The video below introduces Zeus:
Performance Improvements
Zeus is optimized for maximum performance in key workloads:
Content Creation: Film, TV, Broadcast/Transcoding, Advertising, Architecture Visualization (ArchViz), Game Development, XR
Content Consumption: Gaming, Video Transcoding
HPC (High Performance Computing): Compute workloads, Physics Simulations
Content Creation
3D artists and designers are dependent on high-quality renderings of their work. Path tracing is the best rendering method, providing photorealistic renderings.
For a visual introduction to path tracing, we recommend https://www.youtube.com/watch?v=frLwRLS_ZR0!
Historically, path tracing has been too slow to be used outside specific scenarios where the user can wait minutes for a low-resolution render, hours for a single 4K frame to render, or months for a film to render:
All this physical realism came with a high price tag: big budget productions in media and entertainment were spending vast resources on hardware, storage and computation as the bar of photorealism was raised ever higher. Budgets and artist time expanded for larger and larger projects. Rasterized pipelines started to fall by the wayside as only the increased realism and quality of ray traced projects became the norm. However, the rendering power of a standard artists workstation was not increasing at the exponential rate of the complexity of projects.
(from https://bolt.graphics/history-of-rendering-rasterization-ray-tracing-path-tracing/)
This performance limitation has forced developers and artists to create complex workarounds to try to get similar visual output without performing the computations required for the full path traced image:
In spite of these issues, artists continue to use ray tracing to solve complex lightning and realism issues. Without ray tracing, the artist has to spend a lot of time faking lighting effects like shadows and reflections. This has lead to the quick adoption of ray tracing in other industries that require a higher level of realism: product advertising, architecture, and of course gaming. As these (and other) industries adopt ray tracing, artists and consumers find it hard to go back to legacy rasterization.
(from https://bolt.graphics/ray-tracing-grows-across-industries-despite-hardware-issues/)
To solve this widespread industry problem, we are building Zeus to drive towards, and enable for every artist, designer, and engineer, Real Time Path Tracing!
10x Path Tracing
Zeus improves path tracing performance by up to 10x vs Nvidia RTX 5090. This is possible by redesigning the GPU for path tracing, instead of shoehorning it into a legacy rasterizer GPU. A render that takes 30 minutes on an Nvidia GPU will take less than 15 minutes on Zeus 1c26-032, our entry-level GPU that consumes only 120 watts.
Performance is measured by counting the time it takes for GPUs to render scenes of varying complexity. The Nvidia GPUs are benchmarked using Blender Cycles & Optix, while the Amd GPUs are using Cycles & HIP. Bolt Zeus is benchmarked using Glowstick.
The scenes rendered vary from ~100,000 triangles and a few textures to >4 million triangles and hundreds of textures. The tested scenes are always in OpenUSD format, with MaterialX materials.
Glowstick
In order to test and improve the effectiveness of our path tracing hardware, we needed to build a testbench that path traced renders. Initially, we used it internally to produce renders to compare against existing renderers like Arnold, Iray, and Cycles. We decided to productize this renderer, and named it Glowstick.
Glowstick is our in-development path tracer that is co-designed with Zeus hardware. It will be included at no cost with Zeus GPUs. We’ll talk more about DCC integrations and asset/material libraries below.
We chose to use open standards whenever possible. OpenUSD is becoming widely adopted by artists and developers across many studios and content creation tools as the industry moves away from proprietary and sometimes incompatible asset formats.
We are integrating Glowstick into many of your favorite content creation tools. The way Glowstick is integrated into those tools varies depending on what interfaces the tool provides.
Houdini, Maya, 3ds max, and Nuke provide support for Hydra delegates. Glowstick is already integrated inside USDView as a Hydra delegate, and we are working on porting that to the tools listed above.
SketchUp, Unreal Engine, Blender, Revit, and many other content creation tools use tool-specific plugin interfaces to integrate other renderers. Glowstick will be available as installable plugins for those tools.
Materials & Textures
Physically-based materials are key to path tracing. Materials define how light interacts with the scene: whether the object is a highly polished metal, an emissive light, or a translucent glass. Materials can be scanned or generated procedurally. Scanned materials are usually high resolution (4K, 8K, and higher).
Scenes with hundreds of physically-based materials require lots of memory. Legacy GPUs have slowly increased the amount of soldered memory over time. CPUs have always provided the flexibility to install hundreds of GBs and even TBs of memory.
Many GPU renderers crash when trying to render a scene that requires more memory than is present on the GPU. It is then up to the artist or developer to try to work around this. The size of the materials can be reduced by also reducing the quality of the material. There are many cases where this is a difficult tradeoff to make.
Zeus solves this problem with legacy GPUs by enabling users to install up to 384 GB in a single PCIe card, 4x as much as the RTX 6000 Pro (96 GB), and 12x as much as the 5090 (32 GB).
Content Consumption
Real-Time Path Tracing
High quality real-time experiences like games, concerts, and metaverses are also dependent on real-time path tracing. Today, enabling path tracing features in a game reduces performance by >50%. In the case of Cyberpunk 2077, this means going from almost 60 fps without path tracing enabled, to under 20 fps (Nvidia 4090, 4K resolution, ultra settings).
The chart below shows the path tracing performance limitations of legacy GPUs. High quality path tracing relies on multiple rays per pixel bouncing around the scene multiple times.
Performing a simple raycast (1 ray per pixel, 0 bounces) is feasible on even Intel B580. However, attempting to calculate soft shadows requires multiple rays per pixel, and at least 1 bounce. To get around this limitation, most real-time applications reduce the resolution of the ray tracing layers, and sometimes the frequency of the layers to half the frame rate or below.
Legacy GPUs just do not have the ray tracing performance (and certainly not the path tracing performance) to fully render scenes in real-time. Zeus 1c (120 watts) dramatically increases the ray-triangle intersection budget to almost 26 rays per pixel, which is enough for 4-5 rays per pixel and 3-4 bounces. Zeus 2c (250 watts) doubles throughput and enables real-time path tracing for the first time without requiring resolution decreases.
Simply measuring ray-triangle intersections does not represent all of the computations required to path trace a scene. Acceleration structure generation, traversal, and shading all take time. The time spent in those stages varies depending on the size and makeup of the scene being rendered, the number and complexity of the materials, and of course moving geometry (animations, deformations, etc.).
Zeus also offloads acceleration structure generation and traversal to further increase performance. For example, generating the acceleration structure for Terminator.usd, a scene with over 2 million triangles, takes around 0.6 milliseconds. Acceleration structure building and traversal on Zeus exerts minimal compute pressure on the scalar, vector, or matrix cores: it is fully offloaded.
For a performance guide on Lightning, the Path Tracing accelerator included in Zeus: Developer Guide: Path Tracing
Game Physics
Coming soon!
HPC
Today’s world is dependent on simulations to discover materials and drugs, design parts, simulate systems, and explore growing areas of scientific research. Simulations are used everywhere:
Carmakers historically used expensive wind tunnels to measure the aerodynamic effectiveness of a vehicle under development. Using supercomputers instead eliminates the need for a manufactured sample vehicle, clay model, and wind tunnel. Over multiple design iterations this significantly reduces cost and the time spent inthe feedback loop.
A similar story unfolds in other industries, from Oil & Gas to Aerospace: manufacturing a sample takes time and money. The future of advanced manufacturing foregoes physical prototyping for digital design and testing. Many enterprises have written in-house solvers for various purposes, enabling them to work around hardware limitations.
(The Decline of) Accurate HPC
HPC has historically required FP32 and preferred FP64 accuracy. Computations that are dependent on previous computations exhibit increasing error. For example, when running a computation over many timesteps, error will compound and the final result is significantly inaccurate.
As Hyperion Research noted in their April 2025 HPC User Forum update, legacy GPU vendors are reducing the amount of FP64-capable cores in favor of lower precision like FP16, FP8, and MX6.
Similarly, Timothy Morgan (The Next Platform, emphasis ours) states:
We are snobs about FP64 and we want those running the most intense simulations in the world for weather and climate modeling, or materials simulation, or turbulent airflow simulation, and dozens of other key HPC workloads that absolutely need FP64 floating point to be getting good value for the compute engines they are buying.
…
With the “Blackwell” B100 announced last year and ramping now, the peak vector FP64 performance was only 30 teraflops, which is a 10.5 percent decline from the peak FP64 performance of the Hopper GPU. And on the tensor cores, FP64 was rated the same 30 petaflops, which is a 55.2 percent decline for peak FP64 performance on the Hopper tensor cores. To be sure, the Blackwell B200 is rated at 40 teraflops on FP64 for both the vector and tensor units, and on the GB200 pairing with the “Grace” CG100 CPU, Nvidia is pushing it up to 45 teraflops peak FP64 for the vectors and the tensors. The important thing is tensor FP64 is not double that of vector FP64 with Blackwells, and in many cases, customers moving to Blackwell will pay more for less FP64 oomph than they did for Hopper GPUs.
Unlike our competitors, which dedicate a significant area of the chip for lower-precision math, we dedicate a significant area of the chip to higher-precision math.
(The Lack of) Consumer HPC
Legacy GPUs with sufficient FP64 processing power have been restricted to workstation or datacenter use. This tremendously restricts the market, as individuals, power users, and researchers need access to large amounts of compute to run useful simulations.
We’ve chosen to go down a different path: all Zeus GPUs, whether consumer or enterprise, have fully unlocked FP64 ALUs:
We enable all markets and users across many pricing tiers to have access to accurate computations!
Enhancing Key Workload Performance
Simply restoring the 2x FP32:FP64 compute ratio was not enough for us. We went further and co-designed Zeus hardware alongside key HPC software. HPC has many “key workloads”, including molecular dynamics, fluid dynamics, and electromagnetic simulation. We started with optimizing electromagnetic simulations.
Zeus performs electromagnetic simulations 300 times faster than Nvidia B200, without sacrificing accuracy or data output capabilities. The software package is called Apollo.
In the FPGA, where we test our hardware and software, we already beat the performance of 64-core CPUs and H100 GPUs!
Expandable Memory
We’re giving the power back to the user by enabling them to install the amount of memory they need.
Zeus SKU | Soldered LPDDR5X | Maximum DDR5 | Total Max Capacity |
|---|---|---|---|
1c26-032 | 32 GB | 128 GB | 160 GB* |
2c26-064 | 64 GB | 256 GB | 320 GB* |
2c26-128 | 128 GB | 256 GB | 384 GB* |
4c26-256 | 256 GB | 2,048 GB | 2,304 GB** |
* Using 64 GB 5600 MT/s SO-DIMMs, like Crucial CT64G56C46S5. Users can choose lower density & higher bandwidth modules.
** Using 256 GB 4800 MT/s RDIMMs, like Samsung M321RBGA0B40-CWK. Users can choose lower density & higher bandwidth modules.
We selected LPDDR5X for soldered memory due to its low latency (<100 ns), low cost per bit (>3x lower than GDDR/HBM), and wide availability through multiple suppliers.
We selected DDR5 for expandable memory for the same reasons, including low latency, low cost per bit, and wide availability through many suppliers.
Zeus memory can be configured in various ways:
LPDDR5X only, when no DDR5 is installed.
Flat, where both LPDDR5X and DDR5 appear as separate address spaces.
Tiered, where DDR5 is only visible and LPDDR5X acts as a transparent cache.
Built-In Ethernet
We’ve integrated Ethernet directly into Zeus, eliminating the need for separate network cards (and DPUs). This reduces power consumption, latency, complexity, and cost. Zeus can talk over standard Ethernet to any other Ethernet device. This enables new workflows that previously were too complex and expensive to scale:
Zeus includes support for RDMA over Converged Ethernet v2 (RoCE v2), enabling data transfer between the memories of connected devices without CPU intervention.
Zeus Cluster
Historically, CPU and GPU cores were physically and logically far away from each other, requiring a host-device programming model. The application running on the CPU accesses the GPU device through a PCIe interface. The CPU and GPU have their own memories, and running a workload that performantly uses both the CPU and GPU requires careful programming.
Zeus eliminates the host-device programming model by enabling applications to access vector, matrix, and other accelerators through CPU instructions. This:
Reduces latency when, for example, both the scalar and vector code is tightly coupled and high latency to the vector limits performance.
Reduces complexity and latency when sharing data between blocks inside a cluster.
Zeus adopts and extends the RISC-V ISA to enable straightforward interoperability with the wider ecosystem.
As Zeus is a cloud-native GPU, each cluster can be isolated from the others by controlling the router.
High Performance CPU Core
Each Zeus cluster includes a high performance RVA23 RISC-V CPU that offers high single-thread performance:
SpecInt2006 ~25/GHz
Fmax ~3 GHz
8-wide dispatch, 12 stage pipeline
48-bit addressing
This CPU core runs Linux, eliminating the need for a separate PCIe-connected CPU host.
FP64 Vector Cores
Each Zeus cluster includes a wide vector processor comprised of 32x FP64 ALUs (conformant to RVV 1.0), configured as:
ELEN=64
DLEN=512
VLEN=512
In addition, each vector processor has a low-latency, high bandwidth interface to 2 MB of L1 cache. This results in each FP32 ALU having access to 64 KB of L1 cache, compared to ~5 KB on GB200.
Vectorizable workloads that are memory-bound on other architectures will perform better on Zeus.
For example, a simple peak floating point benchmark intentionally limits data movement to test the theoretical maximum FLOPs:
static const char glsl_fp64_p4_data[] = R"(
#version 450
layout (constant_id = 0) const int loop = 1;
layout (binding = 0) writeonly buffer c_blob { double c_blob_data[]; };
void main()
{
const uint gx = gl_GlobalInvocationID.x;
const uint lx = gl_LocalInvocationID.x;
dvec4 c = dvec4(gx);
dvec4 a = c + dvec4(0,1,2,-3);
dvec4 b = dvec4(lx) + dvec4(2,3,5,-7);
for (int i = 0; i < loop; i++)
{
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
c = a * c + b;
}
c_blob_data[gx] = (c[0] + c[1]) + (c[2] + c[3]);
}
)";Reading/writing to more registers or L1 cache typically reduces performance significantly. On Zeus, performance degradation occurs after reading/writing to more registers and L1 cache.
In addition, the vector processor has sufficient bandwidth to memory to keep the ALUs fed (calculated using 5.6 Gbps DDR5 DIMMs and 9.5 Gbps LPDDR5X modules):
Installing faster 8.8 Gbps memory modules increases bandwidth by ~14%.
Accelerators
Each Zeus cluster includes various accelerators made available to applications using custom RISC-V extensions. This fulfills one of the original intents of the RISC-V ISA:
One use-case is developing highly specialized custom accelerators, designed to run kernels from important application domains. These might want to drop all but the base integer ISA and add in only the extensions that are required for the task in hand. The base ISAs have been designed to place minimal requirements on a hardware implementation, and has been encoded to use only a small fraction of a 32-bit instruction encoding space.
Accelerators included in Zeus typically fall into one of three categories:
Math functions perform math operations repeated in a software application. Higher CPU/vector overhead.
div, sin, cosin, tan, arctan, sqrt, sqrt
Kernel-level functions offload specific, high-value functions of a software application for a balance of flexibility and performance. Balanced CPU/vector overhead.
raster, fft
Application-level functions offload significant portions of a software application for maximum performance. Low CPU/vector overhead.
lightning, apollo, neptune
These accelerators are orders of magnitude faster than running scalar, vector, or matrix code. Each accelerator has its own buffers, and also uses the cluster-level shared cache when needed.
For documentation on the RISC-V extensions and how to use these accelerators, please contact us.
Video Encoding & Decoding
Coming Soon!
Matrix Cores
Coming Soon!
Router
Coming Soon!
Telemetry
Each Zeus cluster includes a localized telemetry engine which uses a separate interface to expose counters and data:
Power status
Fan status
Temperature sensor readings
Device configuration
Device status
Voltage integrity: detect voltage drop & noise
Timing margin: detect degradation
Engine usage
Zeus includes support for OpenBMC modules, which enables Redfish support into OpenStack, SuperMicro Cloud Composer, and various other tools from HPE and Dell.
This telemetry data enhances the value of the Zeus GPU by giving operators access to data that helps them monitor performance and predict degradation and failures before an application experiences downtime.
In addition to generating a heartbeat and reporting key telemetry, each cluster also reports link status and latency between clusters (inside and across chiplets), enabling the router and application generating the programmable chiplet routing table to find optimal paths between clusters and avoid dead links.
Zeus Chiplet
Each Zeus 1c26 chiplet is composed of 32x Zeus clusters connected to each other in a 2D mesh:
Each chiplet includes UCIe interfaces configured to run AXI Streaming mode. This enables clusters on another chiplet to logically appear as if they are on the same chiplet.
The table below lists key specifications of the entry-level Zeus 1c26 chiplet:
Zeus 1c26 Chiplet | Spec |
|---|---|
CPU Cores | 32 @ 3 GHz |
FP64 vector cores | 1,024 |
Total on-chip caches | 128 MB |
UCIe interfaces | 320 lanes @ 32 Gbps |
DDR5 interfaces | 2x 80b @ 8.8 Gbps maximum |
LPDDR5X interfaces | 8x 32b @ 9.6 Gbps maximum |
Video Encoders | 2x 8K60 streams |
Video Decoders | 2x 8K60 streams |
Display Interfaces | HDMI 2.1a |
The Zeus 1c26 chiplet is connected to an I/O chiplet to convert UCIe to PCIe, providing:
Zeus 1c26 Chiplet | Spec |
|---|---|
PCIe Interfaces | 2x PCIe Gen5 x16 |
Ethernet Interfaces | 400 GbE QSFP-DD (8x56G) RJ-45 RedFish BMC |
Zeus Chiplet Scaling
Zeus packages are configured with 1x, 2x, or 4x Zeus chiplets. The diagrams below show the package configurations:
Memory Latency
Latency to memory has a major impact on workload performance. The chart below shows a benchmark comparing latency across various test sizes, from 4 KiB to 3 GiB:
Due to the inclusion of larger on-chip caches, Zeus experiences lower latency up to 2 MB. When testing larger sizes (forcing access to DRAM), LPDDR5X enables lower latency than GDDR6X and GDDR6.
Zeus PCIe Card
We’re designing two PCIe cards (a single- and dual-slot version) for mass production in 2027, and taking a slightly different approach compared to our competitors:
Unified branding and PCB designs: There are no AIB partners designing/producing different PCBs based off “reference designs” we create. Instead, we are simply designing the PCBs and working with various CMs to produce the cards.
Users no longer need to worry about the quality of different AIB brands
Users no longer see pricing spreads by different AIB brands/tiers
Open PCB designs: We will open access to the PCB schematics and simulations to partners, who can then build all types of cooling mechanisms and accessories for Zeus PCIe cards.
Partners can easily design all sorts of heatsinks, waterblocks, LN2, etc.
Partners can easily design daughterboards that use Zeus' extra PCIe x16 edge connector, including NVMe, other accelerators, other I/O including expanded display interfaces.
Zeus PCIe cards will be available for purchase directly from Bolt and through distributors and retailers.
Zeus Server
The Zeus 4c26-256 package is quite large and relies on full-size DDR5 RDIMMs for expanded memory capacity. As a result, it does not fit in the PCIe or OAM form factor. We have opted to design and partner with leading CMs to produce motherboards with 4x Zeus 4c26-256 GPUs:
Each Zeus 4c26-256 is connected to the others using 2x 800 GbE ports. Each package is also directly connected to up to 8x PCIe Gen5 x5 NVMe devices. As these PCIe interfaces support CXL 3.0, memory expander devices can be used here.
The Zeus server is not configured with an external CPU host, reducing cost, power consumption, and complexity.
The table below lists key specifications of the 2U server with 4x Zeus 4c26-256 GPUs:
| Bolt Zeus 2U Server |
|---|---|
System Peak Power | ~2,000 W |
FP64 / FP32 / FP16 vector tflops | 80 / 160 / 320 |
INT16 / INT8 matrix pflops | 5 / 10 |
On-chip cache | 2 GB |
Memory | Up to 9,216 GB @ 5.8 TB/s |
Path Tracing | 1,228 gigarays |
Video Encoding & Decoding | 32x 8K60 streams AV1, H.264, H.265 |
I/O | 8x 800 GbE (OSFP or QSFP-DD) |
Zeus Rack
The Zeus 2U server is configured with 8x 800 GbE ports, enabling massive I/O. There are a few ways to connect servers in a rack to each other. We have found that connecting the servers directly to each other lowers costs, complexity, and power.
The Zeus rack design peaks at around 44 KW: this can be air cooled. We are working on a 1U variant that is liquid cooled, doubling performance, capacity, and power density to almost 90 KW.
Please reach out to us for the server configuration options (memory, NVMe, etc).
In this configuration, half of the 800 GbE ports are used to connect to the next server (in a 2D mesh, inside the rack and across neighboring racks). These can lower power, inexpensive passive DACs due to the short distance required ( 2ft DAC between servers in a rack, and 5-10 ft DAC between racks).
Optical cables are needed to connect each server to one or more switches. The remaining 4x 800 GbE ports provide flexibility in configuring a backend and frontend network in addition to the local 2D mesh.
The table below lists key specifications of the rack with 20x Zeus 2U servers installed:
| Bolt Zeus Rack |
|---|---|
Peak Power | ~40,000 W compute |
FP64 / FP32 / FP16 vector pflops | 1.6 / 3.2 / 6.4 |
INT16 / INT8 matrix pflops | 5 / 10 |
On-chip cache | 40 GB |
Memory | Up to 184 TB @ 116 TB/s |
Path Tracing | 24 terarays |
Video Encoding & Decoding | 640x 8K60 streams AV1, H.264, H.265 |
I/O | 640x PCIe 5.0 x4 |
Software
Zeus adopts the RISC-V standard mainly to integrate inside an existing rapidly growing ecosystem. Emulators like QEMU are widely available and robust, enabling porting and testing work without physical hardware. Various developer boards and SBCs with RVA23 conformant cores are already in development.
Core Software
Core software includes the firmware, bootloader, OS, and drivers:
Bootloader: Coreboot with Linuxboot for enhanced security and portability across existing architectures (x86, Arm)
Firmware: Bolt PLDM-compliant firmware delivery through RedFish
Due to third-party IP licensing, Zeus firmware is not open source. To request access to the source for security analysis, please contact us. We can make the source available under NDA.
Operating Systems
We are working with the Canonical team to expand support for Ubuntu and Ubuntu packages on RISC-V.
Porting & Optimizing
The RISE project stewards the RISC-V software ecosystem: https://riseproject.dev/
Porting software to RISC-V involves a few key steps:
We highly recommend following RISE’s https://riscv-optimization-guide.riseproject.dev/.
QEMU
Setting up QEMU with RISC-V is very straightforward and only takes a few minutes: https://risc-v-getting-started-guide.readthedocs.io/en/latest/linux-qemu.html
Help us (and the entire RISC-V ecosystem) out by setting up QEMU and trying to compile and run code that is interesting to you!
Compiler Toolchain
LLVM
For more information on LLVM & RISC-V: https://llvm.org/docs/RISCVUsage.html
gcc
For more information on gcc & RISC-V: https://github.com/riscv-collab/riscv-gnu-toolchain
Languages & Runtimes
C & C++
Use LLVM or gcc to compile your code to RISC-V.
Julia
RISC-V support is ongoing: https://github.com/JuliaLang/julia/issues/57569
Python
Python 3 installs on RISC-V, with no special changes required to the package! An updated list of available packages: https://gitlab.com/riseproject/python
Fortran
Fortran has been supported on RISC-V for 4+ years (through flang-new and flang): https://blog.llvm.org/posts/2025-03-11-flang-new/
GPU APIs
Cuda
Coming Soon!
OpenCL
Coming Soon!
Vulkan
Coming Soon!
DirectX
Coming Soon!
OpenGL
Coming Soon!
Telemetry & Debugging
Coming Soon!