In this website, you will find:
Analysis - Smokers’ Insurance Preference: Investigation of association between smoking and health insurance.
Analysis - Insurance Charge Predication: Prediction model building for health insurance premiums.
Discussion: Finding of our research.
Interactive tools - Mapping: Location map and information about agencies that provide services related to health insurance.
Interactive tools - Shiny: dashboard for prediction of insurance price. User may enter their age, bmi, whether or not smoking and having children, and the interactive page will give a predicted health insurance price and price range for user.
Final formatted Report of our research
You can also reach to us by clicking the email icon on top right corner of the website. Our github page can also be found be clicking the github icon.
We divide our main research topic, the relationship between smoking and health insurance, into two parts:
In the first part, insurance preferences, we first explore people’s insurance preferences and their smoking status in order to investigate if there is potential association between smoking and insurance. We also investigate some other variables that could be potential confounders or interaction factors between the relationship of smoking and insurance.
The second part, Insurance charges, assists our main research topic, since we can use the factors (age, bmi, smoker, etc) to not only explain what kind of personal characteristics would affect insurance cost, but also build a prediction model for health insurance charges. By doing such, we would gain better insights to the underlying relationship between health insurance and smoking, along with other important variables that contributes to the cost variation of insurance.
Datasets for analysis:
Community Health Survey Public Use Data. This dataset contains survey questions regarding smoking and health insurance. We extracted related questions from the survey for analysis.
Medical Cost Personal Dataset. This dataset sheds lights on insurance charges for different personal characteristics, such as smoking, age, bmi, region, etc.
Datasets for mapping:
Primary Care Access and Planning - Health Insurance Enrollment. This dataset is used for mapping of health consulting centers in NYC.
Medicaid Enrolled Provider Listing: This dataset is used for mapping of Medicaid provide locations in NYC.
In addition, locations of major health insurance carriers in NYC is collected manually and used to create interactive mapping of these carriers.
We are Biostatistics students from Columbia University Mailman School of Public Health. This is a project for the course P8105 Data Science.