Google debuts smaller, faster, and improved version of EfficientNet, dubbed as EfficientNetV2
=============================================================
In a significant leap forward for image recognition, a new model named EfficientNetV2 has been developed and published by Mingxing Tan, Quoc V. Le, and other researchers. This model outperforms previous models on various datasets and offers improved training speeds.
EfficientNetV2, built upon the success of its predecessor, EfficientNet, employs a novel approach called progressive learning. This technique dynamically increases regularisation along with image sizes during training. This strategy allows the model to better adapt to larger images, enhancing its performance.
One of the key features of EfficientNetV2 is the Fused-MB Conv layer, designed to better utilise modern accelerators such as GPUs and CPUs. This layer replaces some depthwise convolution layers, improving the model's efficiency and speed.
The new model has demonstrated impressive results on several benchmark datasets. It outperforms previous models on ImageNet, CIFAR/Cars/Flowers datasets, and achieves strong results on CIFAR-10, CIFAR-100, Cars, and Flowers datasets. Notably, EfficientNetV2 achieves 87.3% top-1 accuracy on ImageNet ILSVRC2012, outperforming the recent ViT by 2.0% accuracy.
Training EfficientNetV2 is also faster, requiring 5x-11x less time using the same computing resources as previous models. On ImageNet, it trains 3x - 9x faster and is up to 6.8x smaller than previous models, making it a more accessible and efficient choice for developers.
The authors of EfficientNetV2 also address the challenge in Neural Network Architecture Search (NAS)—to decrease the number of parameters without lowering the performance. A training-aware NAS is used to dynamically search for the best combination of fused and regular MB Conv layers. A scaling rule is added to restrict maximum image sizes in EfficientNetV2 to avoid memory issues.
Interestingly, the authors point out that not all stages need scaling to improve performance. Smaller kernel sizes with more layers are better for the Fused-MB Conv layers, and replacing some MB Conv layers with fused ones in early stages offers better performance with smaller models.
The main metrics to measure in this case are FLOPS (Floating point operations per second) and the number of parameters. EfficientNetV2 strikes a balance between these two, offering a more efficient and effective solution for image recognition tasks.
The code for EfficientNetV2 will be available at https://github.com/google/automl/efficientnetv2, allowing researchers and developers to experiment with and build upon this groundbreaking model.
Read also:
- Impact of Alcohol on the Human Body: Nine Aspects of Health Alteration Due to Alcohol Consumption
- Understanding the Concept of Obesity
- Microbiome's Impact on Emotional States, Judgement, and Mental Health Conditions
- Criticisms levelled by a patient advocate towards MPK's judgement on PCR testing procedures