Data-driven Historical Stats from the FIFA World Cup between 1930 and 2014 :
In the second part of our article series on data modeling, we are going to use our previously created FIFA World Cup Analysis Pod to create Reports and Workbooks. Creating analyses and reports in Repods, we will be going deeper into the FIFA dataset to glean facts about an event that unites people from across the globe and is followed by over a billion soccer fans.
Before we start, you might want to check out the first steps laid out in Part I of this article series:
You can also check out the live version of our Data Pod here.
Repods has a powerful reporting tool that includes SQL Workbooks. The reporting tool allows us to navigate through the data and its relations in an intuitive tree structure.
Inside the Analysis (Alt+4) section, we can have a look at some of the pre-created reports as shown below:
All Reports and Workbooks are listed from the newest to the oldest.
The Reports section contains a tree-structure overview that enables us to navigate through relations and generate queries.
The Workbooks section allows us to add text (plain text, markdown, html), execute, visualize SQL queries.
To start, let’s take the World Cup Winners as an example. As we can see in the figure below, there are four main areas that will help us during the reporting process:
- The top bar shows the Pod name, a search panel, as well as the current user and other users online.
- Just below the top bar, you can set the time range for our analysis. Here we can select the beginning and the time scale (per century, decade, year, quarter, month, week, day, hour, minute or second).
- In the center, there are all the available data entities in the Pod. We can click on each of them and they will show all the attributes they have. To add an attribute to a report, we can click on it and can choose one of the offered report options.
- At the bottom, we can also see the generated chart, the respective result, and the automatically created SQL query of the respective result.
For instance, we want to know all the countries that won the World Cup and the number of times they have won the World Cup. Therefore, we can simply click on the World Cup entity, select the field Winner, and apply two of the available options: Detail (Group by) and Count.
Repods places all data into the time dimension. So we will always see data related to a time axis. When selecting an attribute, there are two different categories of options to choose from — State Related or Point in Time. Not all available options are shown as these depend on the data type and kind of attribute.
In the State Related option, we can see information about a whole period of time, including the changes throughout the period. For instance, if we want to count the stadiums per city within a year, we would still count a stadium that is demolished mid-year even though it does not exist anymore at the end of the year.
In a Point in Time option, we can analyze the data at an exact point in time, like a cut or snapshot of the data to the precision of milliseconds. Point in Time transformations are typically easier for the database but they are usually meaningless for event tables.
In the next step, let’s have some fun finding out about the World Champions!
Incrementing with More Data
In our previous article, we have used FIFA data only but some important information is missing. For instance, we have the venues where the matches have taken place and the cities, but there is no dataset correlating the cities with the countries. So let’s import the dataset World Major Cities (provided by Simple Maps here).
After importing this data, we need to adjust our model to receive the new data. We can simply go to Core Data (Alt+3) select the City entity, and add the fields Province/State, Latitude and Longitude.
With the model ready, we can go back to Pipes (Alt+2) and create a new pipe to load the cities. In the Pipe Transformation, modify the auto-generated fields according to Code Snippet 5.1.
SELECT r1."city_ascii" as "Name", '-infinity'::timestamptz as VALID_FROM, 'infinity'::timestamptz as VALID_TO, r1."iso3" as "Country", r1."province" as "Province/State", r1."lat" as "Latitude", r1."lng" as "Longitude" FROM S_WORLDCITIES_BASIC r1
Code Snippet 5.1. Updating the Cities Information with External Data
To speed up, let’s complete the remaining Core Tables with the data we have available in the Raw Tables. You can find all the Pipe Transformations below:
Now that we are done with Reports, let’s play with Workbooks in the next section.
Playing with Workbooks
With all the data at hand (or a copy from existing Pod), let’s run a short overview to see all countries that have ever won the World Cup. We have the online version here, but we recommend that you create your own Pod to experiment with.
Let’s open Analysis(Alt+4) and click on Add Workbook. Here we can name the Workbook and start writing SQL queries immediately.
Every new Workbook comes with a *select ‘hello world’* statement and the option to add new queries or Markdown text. It looks pretty boring when empty but we can make it incredibly nice with some simple information.
To start, let’s add some information about what we want to see. Click on the Tt icon to add text and add the content of Code Snippet 6.1:
As we can see, this piece of code adds a picture and a title with HTML and a text using markdown syntax. Just below, we can add a small SQL Query to retrieve the World Cup info about the winner countries:
SELECT “ID” as “Year”, “Winner”, “Second”, “Third”, “Fourth”, “Country”, “Name” as “Host Country”, “Matches Played”, “Goals Scored”, “Attendance” FROM T_WORLDCUP WHERE “Winner” = ‘Brazil’
Code Snippet 6.2. SQL Query Example for World Cups Where Brazil Won
As we can see in Code Snippet 6.2, the target tables must have a prefix “T_” that identifies that we have used the target transformed table to search. You can learn more about it in our first article.
The result of that small piece of code can be seen below:
We can simply press the play button on the side to execute the query and see the desired information. It is also possible to hide/show the query and resize the panel.
Let’s create more panels to include some other World Champions: Germany, Italy, Argentina, Uruguay, England, France, and Spain.
Less than three decades ago, Germany was two countries. So it is a tricky query to be able to include the victories from the former Germany FR.
SELECT “ID” as “Year”, “Winner”, “Second”, “Third”, “Fourth”, “Country”, “Name” as “Host Country”, “Matches Played”, “Goals Scored”, “Attendance” FROM T_WORLDCUP WHERE “Winner” = ‘Germany’ OR “Winner” = ‘Germany FR’ OR “Winner” = ‘Germany DR'
Code Snippet 6.3. SQL Query for German Victories
Adding Other Queries
We can proceed with the SQL queries for the other countries as shown in Code Snippet 7.1 — Champions Selections:
We can also take a look at the HTML/Markdown to include their flags and infos. It gives the Workbook a more refined look and helps to explain the information context. You can get all of them here.
That’s it for now! Once you have learned how to create analyses and reports in Repods, we can start creating infographics for a more interactive data analysis. You can learn more about this in our article Data Visualization with Repods.
Repods is a data platform where you create and manage Data Pods. Data Pods are compact data warehouses with flexible Storage, vCores, and memory plus all the tooling needed for complex data analyses.
Don’t forget to sign up for free at Repods. You can find an online version of the Data Pod used on this tutorial here.
Subscribe to our blog!
Stay up to date with the latest data science and IoT tips and news.