Scroll Top

BIG DATA

This is a free program of 2 ECTS credits that will require that the student attends synchronously online for 16 hours of classes (see time schedule below), and additionally perform some autonomous work at home.

Once the synchronous version of this course is finished by end of November 2025, it will be offered in an asynchronous mode for those persons that are not able to follow the synchronous one in the period scheduled.

Each participant who completes this course, either synchronously or asynchronously, will receive a personalised certificate from EIT.

Please note that registration is now open only for synchronous mode; asynchronous will be open by November 10th.

Summary of the course

This introductory course on Big Data offers a concise overview of the fundamental concepts and technologies in large-scale data management. It covers the 5 V’s of Big Data, the role of Hadoop, Spark, NoSQL databases, and Data Lakes, as well as basic approaches to batch and streaming processing. Participants are introduced to core methods of data analysis and visualization, alongside critical discussions on governance, privacy (GDPR/LGPD), and ethical issues. Designed as a foundation, the course provides essential knowledge for students and professionals preparing to advance in the field of Big Data.

Program OF THEMES
  • What is Big Data? The 5 V’s (Volume, Velocity, Variety, Veracity, Value)
  • Challenges and Opportunities of Big Data
  • Data vs. Information vs. Knowledge
  • Big Data in the Current Context (trends, use cases)
  • Overview of the Hadoop Ecosystem (HDFS, YARN, MapReduce)
  • Introduction to NoSQL Databases (MongoDB, Cassandra, Neo4j – concepts and uses)
  • Real-Time Processing Tools (Kafka, Spark Streaming – introduction)
  • Concepts of Data Lakes and Data Warehouses
  • Hadoop Distributed File System (HDFS): Architecture and Basic Operations
  • Apache Spark: Core Concepts (RDDs, DataFrames, Spark SQL) and Applications
  • Batch vs. Stream Processing: Differences and Use Cases
  • Scalability and Fault Tolerance Challenges
  • Principles of Large-Scale Data Analysis
  • Introduction to Python for Data Analysis (Pandas, NumPy – brief overview)
  • Machine Learning in Big Data (overview of algorithms, examples)
  • Data Visualization Tools (Tableau, Power BI, or similar – concept introduction)
  • Data Quality and Governance in Big Data Environments
  • Data Security and Privacy (GDPR/LGPD and other regulations)
  • Ethics and Responsibility in the Use of Big Data
  • Compliance and Audit Challenges
  • Copyright, database rights and licensing schemes
  • Assessment of strategies
  • Data collection and visualization
  • Statistical analysis
  • Examples of enhancement in organizations strategies

Time schedule (all times in CET)

Day Hours
November 4
17:30-19:30
November 6
17:30-19:30
November 11
17:30-19:30
November 13
17:30-19:30
November 18
17:30-19:30
November 24
17:00-18:30
November 25
17:00-18:30
November 27
17:00-18:30
November 28
17:00-18:30

TRAINERS

Filipe Madeira

Themes 1, 2, 3, 4 and 5