TurboQuant breakthrough: Google's TurboQuant compresses LLM KV-cache up to 6x without quality loss, freeing GPU memory and boosting inference speed. Hybrid attention savings: DeltaNet-style ...
Major privacy violations: Regulators found OpenAI collected sensitive personal data, including health and political details, without consent when training ChatGPT ...