JuiceFS Enterprise Edition is a parallel file system based on object storage. It was deployed on the cloud for testing, using object storage as the data persistent layer, with a metadata cluster of three nodes and a distributed cache cluster of multiple nodes. In BERT testing, JuiceFS maintained over 98% GPU utilization in 1,000 GPU-scale training. In UNet3D testing, JuiceFS maintained over 97% GPU utilization in training approaching 500 GPUs. The distributed cache’s advantage is its strong scalability, improving the overall storage system's read bandwidth.
Key takeaways:
- In September 2023, MLPerf introduced its Storage Benchmark, a large-scale performance testing for storage systems in AI model training scenarios.
- High-performance storage vendors such as DataDirect Networks, Nutanix, Weka, and Argonne National Laboratory released MLPerf test results as industry references.
- JuiceFS Enterprise Edition, a high-performance distributed file system, maintained GPU utilization of over 97% for UNet3D at a 500-card scale and over 98% for BERT at a 1,000-card scale.
- JuiceFS uses distributed cache to greatly improve the system's I/O throughput and uses inexpensive object storage for data storage, making it more suitable for the overall needs of large-scale AI applications.