Res2Net: a new multi-scale backbone architecture
The human vision is generally performed at multi-scale levels. That is why representing features at multiple scales is of great importance for many vision tasks. Recent advances in backbone convolutional neural networks (CNN) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner.
Res2Net: a new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 2, pp. 652–662, 2021