Netflix senior engineer Tejas Chopra has released an open source tool designed to slash the cost of AI usage by stripping redundant data from prompts before they are sent to large language models. The project, dubbed Headroom, aims to address the ballooning costs associated with token-based pricing that have recently burdened companies like Uber and Microsoft.
According to The Register, Chopra developed the tool after facing a $287 bill for a personal project using Claude Sonnet. Upon investigation, he discovered that the majority of his token consumption was driven by machine metadata, nested JSON schemas, and repetitive database columns rather than the actual creative input. The Register reports that Chopra estimates up to 90% of tokens sent to LLMs are redundant.
Optimizing the context window
Headroom operates as a proxy on a developer’s local machine, compressing conversation history, logs, and tool outputs before they arrive at the LLM. While model providers offer their own token-caching settings, Chopra noted that these are often difficult for end users to navigate and can be prohibitively expensive. "You pay two times the cost for your writes to get 90% savings for your reads," Chopra told attendees at the Open Source Summit last week.
Although the project is not an official Netflix initiative, it has been adopted by several teams within the company and has gained traction in the broader developer community. Since its launch in January, the project has garnered 2,000 stars on GitHub and over 120 forks. Chopra emphasized that the primary motivation for the tool is the financial strain on individual developers.
"A lot of our users are people who have been really burned by token costs, more than anything else," Chopra stated during his presentation. The Register noted that users of the tool have collectively reclaimed an estimated 200 billion tokens that can now be utilized for other tasks.