Visualise Covid-19 data Using Elastic Stack (Slovakia)


  Reading 8 minutes
  
  Link

hands-on kibana logstash maps covid-19

Covid-19 and its implications are the topic number one for quite some time. The situation naturally provides us a new dataset which any data analyst can process with different tools.

Despite this is not a happy dataset and one I would rather not be part of, it is also an opportunity to use tools, learn new applications, and hope that anything we do will help in some way to other people around us. Whether it is a scientist looking for a different approach to solve the problem or a student learning new tools using interesting data right now, everyone can benefit. Because I believe that we learn by doing ’things’, I am presenting a complete hands-on example based on Slovakia’s data. The same methodology can be applied for similar use cases or just as a proof of concept when needed.

The live dashboard of the setup is located at covid-19.radoondas.io and the source code in the github repository .

Covid-19 Dashboard

The Data

All over the world, many people, organizations, and companies provide different datasets about the Covid-19 outbreak. Accessibility and quality or completeness depend on each source. Because I live in Slovakia, I focus on local data. Unfortunately, we have no official complete and machine-readable data source provided by government authorities.

Since the beginning of the outbreak, we had only a few sources available, which were technically merged into one source over time. Government webpage dedicated to covid-19 . Data are provided by National health information centre . Access to any machine-readable and open data source is impossible, or I am not aware of it. I consider this as unfortunate. Another source is news that scrapes the official data and possibly enhances with other reports from their resources. This approach does not make for a good and reliable data source - primarily if not published freely. Everyone keeps data for themselves as far as I know about (please correct me if I am wrong).

Because there is no official data source, I decided to put together my own and open it to the public. I believe that the data can be reused, distributed, and enhanced if anyone is willing to do so. The data file file is available in the GitHub repository, and I am happy to accept Issues or PR’s with data enhancements.

It is important to mention other visualizations available in Slovakia, which you can check.

  • korona.gov.sk
  • arcgis
  • each significant media also has a form of visualization available for readers.
Data set description

The data set consists CSV formatted rows with following header. date;city;infected;gender;note_1;note_2;healthy;died;region;age;district

Columns description as of the publishing of this post. It may change over time. Please check for the latest description in Github repository.

Column name Description
date Date - the date of the record
city City - the location of the person infected by covid-19
infected Infected - number of infected
gender Gender, M - male, Ž - female, D - children, X - unknown
note_1 Note 1
note_2 Note 2
healthy Healthy - number of people who recovered from the virus
died Dead - number people who died
region Region
age Age
district District

Architecture and Tools

The application stack of choice is Elastic Stack to manage ingestion and analyze the data

For those with visual understanding, the pipeline of the data flow is very simple and straight forward.

Logstash reads CSV file and indexing documents into Elasticsearch. Then the user is working with Kibana connected to Elasticsearch to view and analyze and visualize documents.

+----------+     +----------+     +---------------+     +--------+     +------+
|          |     |          |     |               |     |        |     |      |
| CSV file +-----> Logstash +-----> Elasticsearch <-----> Kibana <-----+ User |
|          |     |          |     |               |     |        |     |      |
+----------+     +----------+     +---------------+     +--------+     +------+
Geospatial tools

I use Slovakia geospatial data, and I published a different post on how to import Slovakia’s GIS data into Elasticsearch. The post dives deeper into the details on how to index geospatial documents. Please read the article to get familiar with the setup, as it will help you with the following tutorial.

If you do not have the GDAL library already installed, then read also post on how to setup latest GDAL using simple wrapper script for Docker image.

Optional

Working Docker environment in case you need to set up GDAL wrapper and run Elasticsearch cluster using Docker.

How to - tutorial

In the following section, I will describe the minimal steps required to build the same live dashboard on your local machine that will look like the dashboard at covid-19.radoondas.io .

  1. Clone my repository from Github to get all necessary data and configuration files.
   $ git clone https://github.com/radoondas/covid-19-slovakia.git
  1. Navigate to your local copy of the repository.
   $ cd covid-19-slovakia
  1. Make sure that you have your Elasticsearch cluster together with Kibana up and running. It can be any cluster, but a single Elasticsearch node with Kibana will serve the purpose perfectly fine. If you do not have your cluster at hand, use prepared Docker configuration located in the repository root - docker-compose.yml.

    To use docker-compose environment run following command, and it will spin up 1 node cluster with version 7.9.3.

    Optionally, use Elastic cloud to spin up the cluster.

   $ docker-compose up -d
  1. Verify that your cluster is up and running. Open your browser and go to Kibana url http://127.0.0.1:5601/. You now see a Home page of Kibana.

  2. Ingest geospatial data into Elasticsearch. I generated geojson source files for all Towns/Cities/Villages (Mestá) together with Regions (Kraje) and Districts (Okresy). You can find those files in the data folder of the repository (obce.json, kraje.json, okresy.json). JSON files are pre-generated from Geoportal using the GDAL library for format conversion.

    Read more in the section above dedicated to geospatial tools. After the import, you now have 3 indices in the cluster named kraje , obce, and okresy.

    Commands using the GDAL library for the reference (copy/paste might not work depending on your GDAL install method):

   $ ogr2ogr -lco INDEX_NAME=kraje "ES:http://localhost:9200" -lco NOT_ANALYZED_FIELDS={ALL} \
     "$(pwd)/data/kraje.json"
   $ ogr2ogr -lco INDEX_NAME=obce "ES:http://localhost:9200" -lco NOT_ANALYZED_FIELDS={ALL} \
     "$(pwd)/data/obce.json"
   $ ogr2ogr -lco INDEX_NAME=okresy "ES:http://localhost:9200" -lco NOT_ANALYZED_FIELDS={ALL} \
     "$(pwd)/data/okresy.json"
   #Imported indices
   GET _cat/indices/obce,kraje,okresy?v
   health status index  uuid   pri rep docs.count docs.deleted store.size pri.store.size
   yellow open   okresy uidA   1   1         79            0      5.1mb          5.1mb
   yellow open   obce   uidB   1   1       2927            0     26.5mb         26.5mb
   yellow open   kraje  uidC   1   1          8            0      1.8mb          1.8mb
  1. Ingest data into Elasticsearch and stop Logstash Docker container when documents are indexed. This is a one-time index job. Run the following command from the root of the repository. If all works as expected, then you will see in the output no errors related to the ingest process and documents currently being index will also be printed on the screen.

    There are a few essential pieces in the ingest command.

    • we use a special template for correct mapping in elasticsearch -v $(pwd)/template.json:/tmp/template.json
    • we point all files from data folder to the local source file -v $(pwd)/data/:/usr/share/logstash/covid/
    • we use our custom Logstash configuration -v $(pwd)/ls.conf:/usr/share/logstash/pipeline/logstash.conf
    • we disable monitoring which causes warning messages as we did not configure it -e MONITORING_ENABLED=false
   docker run --rm -it --network=host \
   -v $(pwd)/template.json:/tmp/template.json \
   -v $(pwd)/data/:/usr/share/logstash/covid/ \
   -v $(pwd)/ls.conf:/usr/share/logstash/pipeline/logstash.conf \
   -e MONITORING_ENABLED=false \
   docker.elastic.co/logstash/logstash:7.8.1
   # Check the index
   GET _cat/indices/covid-19-sk?v
   health status index       uuid     pri rep docs.count docs.deleted store.size pri.store.size
   yellow open   covid-19-sk uidddd   1   1       1751            0    557.5kb        557.5kb
  1. Additionaly, import the template and actual data for annotations used in visualizations
   cd data
   # Import index template
   curl -s -H "Content-Type: application/x-ndjson" -XPUT "localhost:9200/_template/milestones" \
     --data-binary "@template_milestones.json"; echo
   # You should see message: {"acknowledged":true}
   # Index actual data
   curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary \
     "@milestones.bulk"; echo
   # List template
   GET _cat/templates/milestones?v
   name       index_patterns order version composed_of
   milestones [milestones]   0
   # List index
   GET _cat/indices/milestones?v
   health status index      uuid     pri rep docs.count docs.deleted store.size pri.store.size
   yellow open   milestones uidddd   1   1         17            0      7.6kb          7.6kb
  1. Import Saved Objects from provided visualisations file and adjust patterns appropriately if needed.

    Note, if you named imported indices for geospatial data as in the example (kraje,okresy, obce), then there is no need to adjust index patterns.

  2. Navigate to Dashboards and find the one with name [covid-19] Overview Dashboard, open and check the content.

    If all goes right, this is the dashboard you will see (or similar as visualisations might develop over the time).

    Covid-19 Dashboard

    Covid-19 Dashboard in white

Note: The adjustment of all saved objects should not be necessary if you used ogr2ogr (or the wrapper) to import the geojson data. The mapping and index name will match those in Saved objects. If you use a different setup, then you will need to adjust ID’s of pattern inside of the visualisations.ndjson file.

Additional documentation

Let me know if you have learned something new, or if you were able to follow this hands-on tutorial successfully. Feel free to comment or suggest me how to improve the content so you can enjoy it more.