A L Y S S A

R O S E

A B O U T





Hi! Thanks for visiting my page

I am currently a student at Tufts University in the School of Engineering studying Data Science and Mathematics. When not in class, I am active in Students for Exploration and Development of Space creating software for a CubeSat that will be launched from the International Space Station. Additionally, I am also on the Programming and UI team for a small radio telescope project which will be used by the Tufts' Astrophysics Department for future research and educational purposes.

I am also the Co-Founder of the Interdisciplinary Data Intensive Applications Society (IDIAS, prounounced 'ideas') which is a project based data science education group. Here is the Facebook page for the group!.

Feel free to browse my portfolio and some of my personal projects!

R E S U M E





P R O J E C T S





Bigram Topic Modeling (NASA)



This project sought to answer the question of "how much does NASA spend on 'data activities'" in the 2019 fiscal year. I attempted to answer this question in 3 steps: first collecting data on project grants that were awarded, determining which project descriptions fell into the category 'data activity', and then summed up the current and total action obligation amounts to get a dollar total.





Knowledge Graph + NLP (NASA)



How can NASA identify dead ends in its policies? Tackling this question began with a web scrape of NASA's website that contains all policies (NPRs and NPDs). This task required handling multiple formats, including standard html pages, pages that hosted pdfs of the policies, and pages that had links to the seperate sections of the policy documents. This data was then cleaned via stop word removal, stemmatization, and sentence and word tokenization. After, all instances of references to other NPDs/NPRs had to be discovered via regex on the tokenized word lists to find NPR/NPD and the associated policy number. This was used to make a directed graph where if policy A references policy B, then A --> B. Pyvis was then utilized to visualize this graph.







LSTM Arabic Poem Generator (Personal)



I developed a web scraper that collected the Arabic poems for training from here using which was cleaned and process by word tokenization, removal of non-essential characters, and ridding of diacritics using the PyArabic library. A word-level LSTM model was trained on sequences of sizes 50 - 60 words, which was used to generate the poems. The project can be found at this repository.



Machine Learning Templates (Personal)



A personal project to create templates for machine learning concepts such as artificial neural networks, convolutional neural networks, k-means clustering, hierarchial clustering, natural language processing, Thompson sampling, Random Forest regression and classification and more.

The R templates and the Python templates are available on my GitHub.





Predicting Cervical Cancer (Academic)



An academic project that sought to answer "Can we diagnose cervical cancer earlier?". The project analyzed healthcare data from hundreds of women in order to understand the relationship between the 27 recorded attributes and whether or not the woman tested positive for cervical cancer. Attributes included fields regarding age, smoking habits, number of sexual partners and use of oral contraceptives and diagnoses were made with four different tests.

The data was analyzed using techniques such as Singular Value Decompisition, Principal Component Analysis, Attribute Relevancy, Support Vector Machines and Artificial Neural Networks. The code (Python and MATLAB) can be found on my GitHub and the final paper discussing the results and impact can be found here.





Infrastructure Funding (Personal)



Personal project that sought to understand where federal infrastructure funding is the most impactful. The project analyzed the counties in the USA that have the highest levels of child poverty rates and also reside in states with the worst infrastructure scores as determined by the ASCE Infrastructure Report Card. Data cleaning and processing was done via Python and data visualization was made possible with Tableau. The data processing code can be found on my GitHub



Overview of Projects (NASA)



All of the projects that I was able to complete during my 4 months CO-OP at NASA. All of these projects were written in Python. Unfortunately I cannot upload the code for the projects.

T E D x





How does an idea turn into a bold statement? Turns out, the answer is months of editing and tens of practices on stage! I received the honor of being able to give a TEDx talk on the intersection of two of my favorite topics: data science and ethics. This sub 13 minute talk explores the idea of data scientists taking on a hippocratic oath for themselves, one based on the idea of handling data properly to minimize harm via the exchange (and leaking) of sensitive user info. This talk was one of the hardest things I've done, but it was the best experience! You can find the talk here

Sourcerer

S O U R C E R E R



Clicking the icon on the left will take you to my Sourcerer page.



This page gives a succint view of my GitHub activity, inlcuding my number of commits and languages that I use the most!



The page also gives an overview of the techniques I have the most experience with, including deep learning, machine learning, and natural language processing (all the fun buzzwords)



C O N T A C T





E M A I L