In this study we explore the potential of Large Language Models (LLMs) to interact with Earth Observation (EO) data with some promising first results. We source a first testing set for the BIG Bench framework. With 20+ subtasks, we aim for a large variety of questions which are currently beyond the capabilities of LLM and other models. Furthermore, we implement a methodology to test proprietary state-of-the-art models where token distribution may not be available. Finally, we set a new baseline performance with our geospatial intelligent agent GIA, which improves LLMs ability to reason, and enables them to use tools. This activity serves as a first step towards measuring the capabilities of LLM in an EO setting, establishing various performance metrics and setting a baseline.