Naver’s biggest asset is the rich amount of Korean reviews for all kinds of movies/products/restaurants users generate in their Naver blogs. I had to recently work in
Limitations:
No “fraction of GPU” per pod
Our Service GPUs(V100) didn’t support MIG, so our inference service pods needed to be assigned a whole GPU.
Limited GPU Instances
Our team has 8 Nodes of V100 GPU, which was more than enough for a few years.
However, as the services our team provided kept increasing, we were running out of available V100 GPUs and it became apparent that we have to use our limited resource more efficiently. While some of our models were in the Billion-parameter range, even those models weren’t big enough to utilize one V100 to maximum capacity. A lot of computation power was being wasted assigning one model for each GPU.