Study program / study programs: Advanced data analytics
Course: Big Data in space science and its analysis
Teacher(s): Luka Č. Popović, Andjelka Kovačević, Dragana Ilić
Course status: Elective
ECTS points: 7
Prerequisites:
Course objective:
Daily large amounts of new data related to the cosmic research are being collected, using both ground-based and space-based telescopes, as well as those collected from missions that observe Earth from space (e.g. Copernicus program of satellites). Earth observation data from satellites can be used for various human activities on Earth, from sociological (migration monitoring), biological, industrial, telecommunication, to those related to the study of climate change.
The goal of this course is to introduce students to what type of data can be obtained from space research, providing a broad and practical introduction to large data: data analysis techniques including databases, data mining, machine learning and visualization of data; data analysis tools, including the use of SQL and Python. Tools and techniques are practical, providing the foundation for future research and application.
Learning outcomes:
The student is able to handle and apply tools and techniques for processing large data in their original research areas as well as for eventual applications in the space industry.
Course structure and content:
Introduction: The method and technique of collecting data in astronomy using telescopes and satellites. Methods of collecting satellite data for Earth оbservation. The aims of these observations and their application in research and practical application. Introduction to large databases and their organization. Platforms of large databases and storage of large data. Big data in space science. Large Data Surveys and Providers in Space Science: LSST, ELT, GAIA, SDSS, etc. Database mining with the SQL and the  Python, introduction to Flexible Image Transport System (FITS), FITS average and median, effective way of comparing data from different databases (cross-matching data), displaying large data from Earth’s surveying satellites: visualization of large data on the map.
Dimensionality Reduction: PCA, PCA kernel, PCA as noise filter in data, introduction to Scikit Learn, Hyperparameters and model validation, best model selection, categorical image characteristics, inserting inaccessible data, Bayesian classification, Regression, Classification and Clusters, Machine Learning in Python. Data mining algorithms, Training models, Support Vector Machines with the application of recognition of parts of complex images, Decision Trees and Random forest with application, Kernel density estimation with application on recognition of parts of complex images, final project in Machine Learning in space science.
Literature/Readings:
Aurélien Géron,  Hands-On Machine Learning with Scikit-Learn and TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems, 2017, O’Reilly Media
Jake VanderPlas, Python Data Science Handbook, 2017, O’Reilly Media
The number of class hours per week:
Lectures: 5
Labs: 0
Workshops: 0
Research study: 3
Other classes: 0
Teaching methods: Individual and group work; lectures and labs
Evaluation/Grading (maximum 100 points)
Evaluation/Grading (maximum 100 points):
Pre-exam requirements: 70 (Class activity – 10, Hands-on activity – 60)
Final exam (Tests / Final exam in writing): 30