|
Tommaso Galliena
I am a PhD student in the PhD of National Interest in Robotics and Intelligent Machines (phDRIM)
at the Italian Institute of Technology, where I work across the
Pattern Analysis and Computer Vision and
Humanoid Sensing and Perception labs under the supervision of
Lorenzo Natale,
Alessio Del Bue, and
Pietro Morerio.
My research sits at the intersection of 3D scene understanding,
vision-language models, and self-supervised learning.
I am interested in building embodied agents that can develop persistent, spatially grounded representations
of the world, enabling consistent object understanding, reasoning, and navigation across viewpoints and time.
Previously, I was a Visiting Research Student at Simon Fraser University in Vancouver, Canada,
where I worked in the 3D Language and Generation lab under the supervision of
Prof. Angel Xuan Chang.
Since March 2026, I have been a Research Intern in the Geometric Deep Learning group at
NAVER LABS Europe, supervised by Gabriela Csurka and Vassilina Nikoulina,
where I work on scalable representations that bridge vision, geometry, and language.
Email /
CV /
Scholar /
Twitter /
Linkedin /
Github
|
|
|
|
Memory-Augmented Vision-Language Agents for Persistent and Semantically Consistent Object Captioning
Tommaso Galliena,
Tommaso Apicella,
Stefano Rosa,
Pietro Morerio,
Alessio Del Bue
Lorenzo Natale,
Pre-print, 2026
project page
/
arXiv
We rethink captioning as a memory-driven embodied process, where agents actively explore and refine object descriptions to achieve cross-view semantic consistency
|
|
|
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Tommaso Galliena,
Tommaso Apicella,
Stefano Rosa,
Pietro Morerio,
Alessio Del Bue
Lorenzo Natale,
ICCV, 2025   (Highlight)
project page
/
arXiv
We propose a novel self-supervised learning framework for embodied image captioning, where an agent explores a 3D environment to generate spatially coherent image descriptions and collect challenging training data to fine-tune vision-language models.
|
|
|
Semiautomatic volume measure of kidney vascular territories on CT angiography to plan aortic aneurysm repair in patients with horseshoe kidney
Axel Bartoli,
Alberto Colombo,
Franscesco Pisu,
Tommaso Galliena,
Chiara Gnasso,
Enrico Rinaldi,
Germano Melisano,
Anna Palmisano,
Antonio Esposito
Journal of European Radiology, 2024
Paper
By developing a semiautomatic CTA-based model to measure kidney vascular territories, we enable precise preoperative planning for aortic aneurysm repair in patients with horseshoe kidney, reducing risk of postoperative renal damage
|
Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.
|
|