Data Science / AI Intern - Literature Mining & Graph Modeling 🎓
Description
AstraZeneca is seeking Master’s and PhD students studying Biology, Computer Science, Chemistry, Physics, Engineering, Biomedical Science, Pharmacology, Data Science, Bioinformatics, or a related discipline for a 10-week internship role at our site in Waltham, MA from June 01, 2026- August 07, 2026. This internship sits at the intersection of data engineering, biomedical NLP, and translational science, enabling faster insight generation for R&D teams.
Position Description:
-
Build an end-to-end pipeline turning literature (papers, abstracts, patents) into a standardized knowledge graph with contextualized evidence.
-
Handle source selection, inclusion/exclusion criteria, updates, and data snapshots.
-
Develop NLP for entity recognition, relation extraction, assertion detection, and context tagging (drug, indication, resistance, biomarker, outcome).
-
Encode domain relations (e.g., Drug–mechanism→Gene/Pathway; Biomarker–modulates→Outcome; ADC–targets→Antigen).
-
Map entities to controlled vocabularies; manage synonyms, disambiguation, and canonical IDs.
-
Implement edge-level confidence scoring (source quality, claim type, co-occurrence, citations, model certainty) with full evidence provenance.
-
Build graph storage (property graph or RDF) and queryable APIs.
-
Deliver interactive visualization (UI or notebook) with filters, context toggles, and evidence drill-down.
-
Define metrics, run error analyses, and validate with scientific stakeholders.
-
Ensure reproducibility and documentation: version models/data; record architecture, assumptions, benchmarks; provide user guides.
-
Present outcomes to data science, oncology, and translational medicine teams.
Position Requirements:
-
Master’s and PhD students studying Biology, Computer Science, Chemistry, Physics, Engineering, Biomedical Science, Pharmacology, Data Science, Bioinformatics, or a related discipline.
-
Candidates must have an expected graduation date after August 2026.
-
US Work Authorization is required at time of application.
-
This role will not be providing OPT support.
-
NLP and ML: NER, relation extraction, transformers; Python-based workflows.
-
Graph/data modeling: experience with Neo4j, NetworkX, or RDF/SPARQL.
-
Domain knowledge: genes, pathways, biomarkers, therapeutic modalities (incl. ADCs) preferred.
-
Reproducibility: version control, environment management, documentation.
-
Soft skills: problem-solving, communication, collaboration.
-
Tech stack: Python (spaCy, Hugging Face), scikit-learn; PyTorch or TensorFlow.
-
Data & viz: pandas; PySpark or Dask; Plotly/Dash, D3.js, Neo4j Bloom.
-
Dev practices: Git, Conda/Poetry, Docker, experiment tracking.
-
Ability to report onsite to Waltham, MA site 3-5 days per week.
-
This role will not provide relocation assistance.
-
Compensation range: $41-$48 per hour
Details
- Location
- Waltham, MA
- Term
- Summer 2026
- Posted
- 1/29/2026
- Expires
- 2/12/2026
Other Internships at AstraZeneca
See All →Machine Learning Research Intern - Summer 2026 🎓
AstraZeneca
Quantitative Sciences & Statistical Programming Intern - Graduate 🎓 🛂
AstraZeneca