Hi, I'm Susanna Morin.
A self-driven, quick starter, passionate data scientist with 2 years of experience turning complex data into clear insights and actionable strategies with the goal of patient care optimization.
About
I hold a strong foundation in predictive and descriptive analytics, as well as statistical modeling applied to Electronic Health Records and Claims Data.
I've successfully led projects for government-funded population healthcare programs that have informed policy and resulted in additional funding at a state and federal level.
My main tech stack relies heavily on using Python, R, and SQL to ingest, process, analyze, and visualize my data.
- Languages: Python, R, C++, HTML/CSS, Bash
- Databases: MySQL, PostgreSQL
- Libraries: NumPy, Pandas, Matplotlib, ggplot2, lm
- Tools & Technologies: Git, JIRA, TortoiseSVN
I’m particularly drawn to opportunities that allow me to work collaboratively with key business stakeholders to identify areas of value, develop solutions, and deliver insights to reduce overall cost of care for members and improve their clinical outcomes.
Experience
- Built a survival analysis model using the cox proportional hazards regression to investigate the association between the survival time in the Severe and Persistently Mentally Ill (SPMI) patients and predictor variables of interest with the potential to inform policy in healthcare
- Maintained and created tables used in downstream population healthcare analysis using Electronic Health Records (EHRs)
- Conducted a total cost of care time series analysis on healthcare provider utilization trends
- Developed and validated a Generalized Linear Mixed Effects Model for utilization care costs
- Tools: Python, R, SQL, Azure, Tableu, TortoiseSVN
- Conducted data validation in collaboration with Nima Aghaeepour Laboratory at Stanford University; Focused research efforts on patient phenotyping and trajectory prediction in neonatal health and morbidity resulting in a publication in Science Translational Medicine
- Pre-processed data using R-Studio; Analyzed relationships between women’s health factors and offspring health outcomes in pre-term labor using statistical methods
- Generalized a CNN model measuring knee osteoarthritis and improved performance by changing the biomarker from bone shape to cartilage thickness Tools: Python, R, OpenCV, Bash, Tensorflow, PyTorch, Git
- Conducted research contributing to the development and driving of technical standards for genomic data; Focused on infrastructure for graph-based genomics
- Developed a genotyper using the Markov Chain Monte Carlo probabilistic model that supports standard variant calling formats; Improved accuracy and performance of genotyper using the Min-Cut algorithm to break out of sampling bottlenecks maximizing mixing efficiency
- Established evaluation methods that compare accuracy metrics against gold-standard datasets
- Tools: C++, Git, Bash, Statistical Inference
- Developed statistical methods used in comparative genomics analysis
- Leveraged single cell resolution data to improve understanding of cell type-specific transcriptional responses
- Investigated how single-cell RNAseq data and single-cell ATACseq data from mouse hearts correlated with each other across drug treatment and disease states to successfully predict enhancer activation due to heart stress
- Established a level of correlation between the two datasets and built a support vector machine (SVM) model to predict enhancer activation (single cell ATAC-seq) from expression data (single cell RNA-seq)
- Tools: Python, Scikit-learn, Support Vector Machines, CLI
Projects

A music streaming web app based on Django
- Tools: Django, HTML, CSS, Bootstrap, SQLite, AWS S3, Heroku
- Register/login to the web app(with OAuth-based Google Sign-In).
- Search and filter songs based on language and singer.
- Create multiple playlists and add/remove songs to/from playlist.
- Scroll through recently played/viewed songs.

An attention-based classification model that aims at generating an answer for a given input image.

A Seq2Seq model that generates a short summary of the given input video.

An image generator based on the concept of adversarial networks (GANs)
Skills
Languages and Databases






Libraries





Frameworks






Other



Education
University of California, San Francisco
San Francisco, CA
Degree: Master of Science in Medical Informatics
- Machine Learning Algortihms
- Biostatistics
- Statistical Methods
Relevant Courseworks:
University of California, Santa Cruz
Santa Cruz, CA
Degree: Bachelor of Science in Computer Science and Bioinformatics
- Data Structures and Algorithms
- Database Management Systems
- Operating Systems
- Machine Learning
- Ethical Algorithms
Relevant Courseworks: