Question 3

Domain 2: Core Machine Learning, AI, and Transformer Foundations

Why is Xavier (Glorot) initialization commonly used for initializing weights in deep neural networks for NLP tasks?