How to 10x Throughput When Serving Hugging Face Models Without a GPU
By optimising how a model is served, we serve over 100 predictions per second with a simply Python API using CPU inference
Over the past 2 years, there has been a steady increase in investment towards Machine Learning initiatives. When we started…