Vision_Transformers. In this project, the Transformer architecture is applied to detect and pinpoint objects. The dataset used is Caltech 101 from the