Together AI

Cloud platform for training, fine-tuning, and serving open-source models like Llama and DeepSeek. Offers serverless inference endpoints, dedicated GPU clusters on Blackwell hardware, and custom pre-training. Research team behind FlashAttention and ThunderKittens; inference stack uses speculative decoding and custom GPU kernels for low-latency, high-throughput serving.

Links

LAST 30 DAYS

No records found

Try modifying the filters