Skip to content

Sleep Health and Lifestyle

This synthetic dataset contains sleep and cardiovascular metrics as well as lifestyle factors of close to 400 fictive persons.

The workspace is set up with one CSV file, data.csv, with the following columns:

  • Person ID
  • Gender
  • Age
  • Occupation
  • Sleep Duration: Average number of hours of sleep per day
  • Quality of Sleep: A subjective rating on a 1-10 scale
  • Physical Activity Level: Average number of minutes the person engages in physical activity daily
  • Stress Level: A subjective rating on a 1-10 scale
  • BMI Category
  • Blood Pressure: Indicated as systolic pressure over diastolic pressure
  • Heart Rate: In beats per minute
  • Daily Steps
  • Sleep Disorder: One of None, Insomnia or Sleep Apnea

Source: Kaggle

🌎 Some guiding questions to help you explore this data:

  1. Which factors could contribute to a sleep disorder?
  2. Does an increased physical activity level result in a better quality of sleep?
  3. Does the presence of a sleep disorder affect the subjective sleep quality metric?

Exploratory overview

Spinner
DataFrameas
df
variable
SELECT *
FROM 'data.csv'
LIMIT 5;
Spinner
DataFrameas
df1
variable
SELECT
    COUNT(*) - COUNT("Person ID") AS null_person_id,
    COUNT(*) - COUNT("Gender") AS null_gender,
    COUNT(*) - COUNT("Age") AS null_age,
    COUNT(*) - COUNT("Occupation") AS null_occupation,
    COUNT(*) - COUNT("Sleep Duration") AS null_sleep_duration,
    COUNT(*) - COUNT("Quality of Sleep") AS null_quality_of_sleep,
    COUNT(*) - COUNT("Physical Activity Level") AS null_physical_activity_level,
    COUNT(*) - COUNT("Stress Level") AS null_stress_level,
    COUNT(*) - COUNT("BMI Category") AS null_bmi_category,
    COUNT(*) - COUNT("Blood Pressure") AS null_blood_pressure,
    COUNT(*) - COUNT("Heart Rate") AS null_heart_rate,
    COUNT(*) - COUNT("Daily Steps") AS null_daily_steps,
    COUNT(*) - COUNT("Sleep Disorder") AS null_sleep_disorder
FROM data.csv;

We don´t have null values in the dataset

What kind of distribution do we have in the dataset?

Spinner
DataFrameas
df13
variable
SELECT 
	"Age",
	COUNT(*) AS Count
FROM data.csv
GROUP BY "Age"
ORDER BY "Age" ;

We have a nice distribution for pacients of all ages

Spinner
DataFrameas
df12
variable
SELECT 
"Sleep Disorder",
"Gender",
COUNT(*) AS count
FROM data.csv
GROUP BY CUBE ("Sleep Disorder", "Gender")
ORDER BY "Sleep Disorder", "Gender";

Sleep Disorder summary:

Insomaia Total pacients: 77, Female: 36, Male: 41

None Total pacients: 219, Female: 82, Male: 137

Sleep Apnea Total pacients: 78, Female: 67, Male: 11

TOTAL PACIENTS: 374, Female: 185, Male: 189

Which factors could contribute to a Slep Disorder?

Analysis of numeric variables

Spinner
DataFrameas
df3
variable
SELECT 
	COALESCE("Sleep Disorder", 'Total') AS "Sleep Disorder",
	AVG("Sleep Duration") AS avg_sleep,
	AVG("Quality of Sleep") AS avg_quality,

	MAX("Sleep Duration") AS MAX_sleep,
	MAX("Quality of Sleep") AS MAX_quality,

	MIN("Sleep Duration") AS MIN_sleep,
	MIN("Quality of Sleep") AS MIN_quality

FROM data.csv
GROUP BY ROLLUP ("Sleep Disorder");

We can observe a decrease in the average duration of sleep and its quality, which is more accentuated in Insomnia.