Covid-19 Data

aggregated dataset of publicly available data on Covid-19 in Luxembourg

Datasets

These data should not be considered authoritative, complete or official in any way. They are are aggregated from multiple sources and provided on a best effort basis.

All datasets are currently only available as .csv files below and on GitHub.

Missing data is indicated by an empty cell, dates are formated as YYYY-MM-DD, date with time as ISO 8601 with local Luxembourg time. See the schema boxes for further details.

Time series

Time series contain one row for each day with available day since the start of the pandemic (~ end of February in Luxembourg). If a data point (i.e. column) isn't available for a given day it is not imputed or filled but left missing (i.e. blank)

Files are provided for every day since data collection representing what was known at that date. Unless you are trying to analyse reporting lag you probably want use the latest data.

Data published on the government's daily update doesn't include any information to which date it is applicable to. For example reproduction numbers are only updated every 2-3 days.

timeline.csv

timeline.csv includes all revisions, updates and additional data for past dates. This is the most complete and accurate data and what most users want to use.

If multiple sources provide a data point for a given day the latest one (or most precise for proportions) is used. Every row is then assigned the date_published of the newest source contributing to it.

as_reported.csv

as_reported.csv only includes data as it was known on a given date. This is useful if you want to report the change in total amount of cases, tests or deaths but not to accurately represent evolution of tests or cases.

Schema
ColumnTypeDescription
dateDateDay the data is applicable to
date_publishedDateTimeDate and time the data was published or revised by the source
casesInt?Confirmed/diagnosed cases. A person is only included once, no matter how often they test positive. It's unclear if patients diagnosed but not tested are included.
cases_residentsInt?Cases among residents and non-residents. These are only published once and not revised by any source and thus do not agree with the revised total of cases on any given day.
cases_nonresidents
cases_newInt?

New cases on the give day. It is unclear if this is the day the test sample is take, the analysis is made or the result is reported. Revisions are made, implying that it is not the day the result is reported.

For as_reported.csv this is always the amount of new cases reported that day.

cases_residents_newInt?see cases_residents
cases_nonresidents_new
cases_residents_propFloat?Proportion of cases among residents and non-residents. This column is calculated from cases_residents and cases_nonresidents and thus not revised, but can be used under the assumption that revisions affect residents and non-residents equally.[citation needed]
cases_nonresidents_prop
cases_sex_f_propFloat?Proportion of cases by sex (female, male, non-binary and unknown)
cases_sex_m_prop
cases_sex_x_prop
cases_sex_u_prop
cases_age_meanFloat?Mean age of all cases.
activeInt?Total cases that are considered active.
active_changeInt?Change in the number of active cases.
recoveredInt?Total persons infected that are considered to have recovered. (Cases are ‘recovered’ 14 days after diagnoses and 48 hours after any symptoms have disappeared.)
recovered_newInt?Change in the number of recovered cases.
testsInt?Cumulative number of persons tested for COVID-19
tests_residentsInt?Total number of residents/non-residents tested for COVID-19
tests_nonresidents
tests_newInt?

Daily new tests. It is unclear if this is the day the test sample is take, the analysis is made or the result is reported. Revisions are made, implying that it is not the day the result is reported.

For as_reported.csv this is always the amount of new tests reported that day.

tests_residents_newInt?
tests_nonresidents_new
tests_residents_propFloat?Proportion of tests among residents and non-residents. This column is calculated from cases_residents and cases_nonresidents and thus not revised, but can be used under the assumption that revisions affect residents and non-residents equally.[citation needed]
tests_nonresidents_prop
tests_sex_f_propFloat?Proportion of tests by sex (female, male, non-binary and unknown)
tests_sex_m_prop
tests_sex_x_prop
tests_sex_u_prop
deathsInt?Cumulative number of deaths from COVID-19
deaths_newInt?Daily new deaths from COVID-19
deaths_transferInt?Persons transferred to Luxembourg from other regions that died as a result of COVID-19.
deaths_transfer_newInt?See above.
deaths_hospital_propFloat?Proportion of persons that died as a result of COVID-19 in hospital or elsewhere.
deaths_nonhospital_prop
deaths_sex_f_propFloat?Proportion of deaths by sex (female, male, non-binary and unknown)
deaths_sex_m_prop
deaths_sex_x_prop
deaths_sex_u_prop
deaths_age_meanFloat?Mean, median, lowest and highest age of patient succumbed to COVID-19
deaths_age_median
deaths_age_min
deaths_age_max
hospital_totalInt?COVID-19 patients currently hospitalised.
hospital_normalInt?See above. In normal care
hospital_intensiveInt?See above. In intensive care
hospital_total_transferInt?Current hospitalised patients transferred to Luxembourg (e.g. from Grand-Est)
hospital_normal_transferInt?See above. In normal care
hospital_intensive_transferInt?See above. In intensive care
hospital_total_changeInt?day-to-day change of the number of COVID-19 patients currently hospitalised.
hospital_normal_changeInt?See above. In normal care
hospital_intensive_changeInt?See above. In intensive care
hospital_total_transfer_changeInt?day-to-day change of the number of COVID-19 patients currently hospitalised transferred to Luxembourg (e.g. from Grand-Est)
hospital_normal_transfer_changeInt?See above. In normal care
hospital_intensive_transfer_changeInt?See above. In intensive care
hospital_admissionsInt?Total admissions to hospital. Only includes inpatients (“hospitalisations stationnaires”)[citation needed]
hospital_admissions_newInt?Daily admissions to hospital
hospital_admissions_new_allInt?Daily admissions to hospital including outpatients[citation needed]
hospital_admissions_new_ma7Int?7-day (centered?) moving average of hospital admissions
hospital_dischargesInt?Total discharges from hospital. Only includes inpatients (“hospitalisations stationnaires”)[citation needed]
hospital_discharges_newInt?Daily discharges from hospital.
hospital_discharges_new_allInt?New discharges from hospital including outpatients[citation needed]
hospital_discharges_new_ma7Int?7-day (centered?) moving average of hospital discharges
reproduction_number_effectiveFloat?Effective Reproduction Number. See for details on the definition used.
reproduction_number_effective_lowFloat?Lower or upper bound of the estimate. Unclear if these are 50%, 90% or 1 standard deviation intervals.
reproduction_number_effective_high
reproduction_numberFloat?Reproduction Number Rt. See above.
reproduction_number_lowFloat?Lower or upper bound of the estimate. Unclear if these are 50%, 90% or 1 standard deviation intervals.
reproduction_number_high
stateString?State of the pandemic, either imported for imported cases only or local after community transmission has been detected.
flags[Int]?

Flags representing issues detected with the data.

  • 1: cases_residents + cases_nonresidents ≠ cases
  • 2: cases_residents_new + cases_nonresidents_new ≠ cases_new
  • 3: tests_residents + tests_nonresidents ≠ tests
  • 4: tests_residents_new + tests_nonresidents_new ≠ tests_new

Age Distributions

Age distributions provide one row for every age group and date. These data have some unknown amount lag (likely 2-3 days) and are their accuracy is limited.

deaths_age.csv

Age of persons that died as a result of COVID-19, binned into very large age groups. Available since mid April.

active_age.csv

Ages of currently[citation needed] active infections.

hospital_intensive_age.csv

Until, and including, May 17 this was the age of patients currently hospitalised and in intensive care. Since May 18[citation needed] this is reflecting all[citation needed] patients having been in intensive care at one point[citation needed].

Indications are that these data isn't complete, for example from May 13 to May 16, these data reported a patient in the age group of 0-5 which subsequently wasn't included in the age distribution of the total population.

hospital_normal_age.csv

Same as with intensive care data these switched from representing current to all hospitalised patients on May 18.

Schema
ColumnTypeExampleDescription
dateDate2020-05-11Day the data is applicable to
date_publishedDateTime2020-05-12T17:30:00.000+02:00Date and time the data was published or revised by the source
measureStringactive_ageMeasure described, should be the same as the filename
groupString80 - 85Humanised name of the age group
intervalString[80, 85)Age interval of the group
min_ageInt?80Lower age bound (usually inclusive)
max_ageInt?85Upper age bound (usually exclusive)
proportionFloat0.1234Proportion of age group

Data Sources

These data have been compiled from publicly available sources and media reports. Especially in the first few weeks of the pandemic data was very sparse and mostly available through media reports and press statements. Later on the data was scrapped from various government websites, as well as graphics and datasets released by the Ministry of Health. These sources do not provide clear documentation of the data presented and thus require assumptions (e.g. the date they are applicable to) to be made when ingested into this dataset.

Issues

Please report problems with the dataset (e.g. inconsistencies, missing or unclear documentation) by opening an issue on GitHub or sending an email to mail+covid19@donneeen.lu.

Known Issues

  • Starting Ministry of Health only updates tests, cases and deaths on weekends.
  • : sources disagree on deaths, 111 assumed to be correct. Revised to 110 the next day.
  • : source for age distributions wasn't updated, no data available for this day.
  • hospital_intensive_transfer decreases to 0 on , is 1 on then 0 again on .
  • reproduction_number hasn't been updated since .
  • has a big increase in tests, these are likely from earlier days.
  • Before active + recovered + deaths ≠ cases.

Missing Data

The following data is currently not publicly available

  • Total number of cases hospitalised at one point as well as the total number of patients needing intensive care (estimated at around 90-100 on )
  • Age distributions of all diagnosed cases (Distribution of active cases is only available since early May.)
  • Total intensive care usage and capacity (including non-COVID-19)
  • Cases that were successfully traced
  • Cases that were associated with a cluster
  • Cases that were a-/presymptomatic when tested
  • Tests conducted as part of the “Large Scale Testing” campaign vs. tests conducted because of symptoms, possible contacts etc.
  • Total tests conducted (only persons tested is known).
  • Cases per date of onset of symptoms and/or probable infection

License

CC0 To the extent possible under law, Ben Elsen has waived all copyright and related or neighboring rights to the donnéeën.lu COVID-19 dataset.