You are viewing a preview of this job. Log in or register to view more details about this job.

Data Scientist

Overview: Reboot Rx is the Boston-based tech nonprofit startup dedicated to fast-tracking the development of affordable cancer treatments using existing non-cancer generic drugs (rebootrx.org). We are developing evidence synthesis technology powered by AI and machine learning to quickly sift through large amounts of data and find the most promising generic drugs to repurpose for cancer. Working with us is a great opportunity to get hands-on experience at a collaborative, cutting-edge social impact startup at the intersection of data science, medicine, and policy.

Details: You should be passionate about expanding treatment options for cancer patients, a hardworking self-starter with attention to detail, and effective at both independent and collaborative work. You will work directly with the founders and will interact with other members of the highly cross-disciplinary team. This is a summer internship, and a commitment of 30-40 hours per week for at least 10-12 weeks is preferred, with the potential to continue part-time during the academic year. All work will be done remotely with a flexible work schedule.

This is a paid position. Many of our previous interns received summer fellowships from their schools to support their work with us, and all applicants are highly encouraged to pursue these opportunities. Applicants who won’t be supported by a fellowship must be authorized to work in the U.S. (we cannot provide visa sponsorship), and preference will be given to candidates who will be living in Massachusetts, Maine, or Rhode Island during the internship. Reboot Rx is an equal opportunity employer.

To apply: Send Devon Crittenden (hiring@rebootrx.org) your resume and a short explanation of why you would like to work at Reboot Rx. Please clearly state in the subject line the position(s) you are interested in. Applicants will be reviewed on a rolling basis.

Data Scientist

Responsibilities:

Build and deploy machine learning (ML)/natural language processing (NLP) pipelines to extract features from scientific literature to handle language-based tasks (named entity recognition, classification, relation extraction)
Regularly conduct analyses of various structured and unstructured datasets to identify patterns and answer research questions
Develop automated pipelines for data analysis, evaluate data quality, and improve performance of algorithms
Develop APIs and backend services to manage, extract, and store data on cloud infrastructure (AWS)
Build data visualizations and web applications to communicate insights to the team

Preferred skills and qualifications:

Pursuing a degree in computer science, engineering, or other quantitative field
Experience with biomedical data is required
Expert coding skills in Python and experience with web frameworks such as Flask
Strong data management, data engineering, and statistical skills
Experience with NLP models (LSTMs, Transformers) and techniques (sentence embeddings, topic modeling)
Familiarity with ML algorithms and packages such as scikit-learn, spaCy, Gensim, TensorFlow, PyTorch, Snorkel, StanfordNLP, etc.
Understanding of databases (SQL, MongoDB) and knowledge graphs
Experience with cloud environments and services (AWS, GCP, Azure)