Sleep Health and Lifestyle

This synthetic dataset contains sleep and cardiovascular metrics as well as lifestyle factors of close to 400 fictive persons.

The workspace is set up with one CSV file, data.csv, with the following columns:

Person ID
Gender
Age
Occupation
Sleep Duration: Average number of hours of sleep per day
Quality of Sleep: A subjective rating on a 1-10 scale
Physical Activity Level: Average number of minutes the person engages in physical activity daily
Stress Level: A subjective rating on a 1-10 scale
BMI Category
Blood Pressure: Indicated as systolic pressure over diastolic pressure
Heart Rate: In beats per minute
Daily Steps
Sleep Disorder: One of None, Insomnia or Sleep Apnea

Source: Kaggle

🌎 Some guiding questions to help you explore this data:

Which factors could contribute to a sleep disorder?
Does an increased physical activity level result in a better quality of sleep?
Does the presence of a sleep disorder affect the subjective sleep quality metric?

Exploratory overview

DataFrameas

df

variable

SELECT *
FROM 'data.csv'
LIMIT 5;

DataFrameas

df1

variable

SELECT
    COUNT(*) - COUNT("Person ID") AS null_person_id,
    COUNT(*) - COUNT("Gender") AS null_gender,
    COUNT(*) - COUNT("Age") AS null_age,
    COUNT(*) - COUNT("Occupation") AS null_occupation,
    COUNT(*) - COUNT("Sleep Duration") AS null_sleep_duration,
    COUNT(*) - COUNT("Quality of Sleep") AS null_quality_of_sleep,
    COUNT(*) - COUNT("Physical Activity Level") AS null_physical_activity_level,
    COUNT(*) - COUNT("Stress Level") AS null_stress_level,
    COUNT(*) - COUNT("BMI Category") AS null_bmi_category,
    COUNT(*) - COUNT("Blood Pressure") AS null_blood_pressure,
    COUNT(*) - COUNT("Heart Rate") AS null_heart_rate,
    COUNT(*) - COUNT("Daily Steps") AS null_daily_steps,
    COUNT(*) - COUNT("Sleep Disorder") AS null_sleep_disorder
FROM data.csv;

We don´t have null values in the dataset

What kind of distribution do we have in the dataset?

DataFrameas

df13

variable

SELECT 
	"Age",
	COUNT(*) AS Count
FROM data.csv
GROUP BY "Age"
ORDER BY "Age" ;

We have a nice distribution for pacients of all ages

DataFrameas

df12

variable

SELECT 
"Sleep Disorder",
"Gender",
COUNT(*) AS count
FROM data.csv
GROUP BY CUBE ("Sleep Disorder", "Gender")
ORDER BY "Sleep Disorder", "Gender";

Sleep Disorder summary:

Insomaia Total pacients: 77, Female: 36, Male: 41

None Total pacients: 219, Female: 82, Male: 137

Sleep Apnea Total pacients: 78, Female: 67, Male: 11

TOTAL PACIENTS: 374, Female: 185, Male: 189

Which factors could contribute to a Slep Disorder?

Analysis of numeric variables

DataFrameas

df3

variable

SELECT 
	COALESCE("Sleep Disorder", 'Total') AS "Sleep Disorder",
	AVG("Sleep Duration") AS avg_sleep,
	AVG("Quality of Sleep") AS avg_quality,

	MAX("Sleep Duration") AS MAX_sleep,
	MAX("Quality of Sleep") AS MAX_quality,

	MIN("Sleep Duration") AS MIN_sleep,
	MIN("Quality of Sleep") AS MIN_quality

FROM data.csv
GROUP BY ROLLUP ("Sleep Disorder");

We can observe a decrease in the average duration of sleep and its quality, which is more accentuated in Insomnia.

‌
‌
‌

Sleep Health and Lifestyle

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Sleep Health and Lifestyle

🌎 Some guiding questions to help you explore this data:

Exploratory overview

Which factors could contribute to a Slep Disorder?

Analysis of numeric variables

Sleep Health and Lifestyle