Data Science / AI Intern - Literature Mining & Graph Modeling 🎓

Description

AstraZeneca is seeking Master’s and PhD students studying Biology, Computer Science, Chemistry, Physics, Engineering, Biomedical Science, Pharmacology, Data Science, Bioinformatics, or a related discipline for a 10-week internship role at our site in Waltham, MA from June 01, 2026- August 07, 2026. This internship sits at the intersection of data engineering, biomedical NLP, and translational science, enabling faster insight generation for R&D teams.

Position Description:

  • Build an end-to-end pipeline turning literature (papers, abstracts, patents) into a standardized knowledge graph with contextualized evidence.

  • Handle source selection, inclusion/exclusion criteria, updates, and data snapshots.

  • Develop NLP for entity recognition, relation extraction, assertion detection, and context tagging (drug, indication, resistance, biomarker, outcome).

  • Encode domain relations (e.g., Drug–mechanism→Gene/Pathway; Biomarker–modulates→Outcome; ADC–targets→Antigen).

  • Map entities to controlled vocabularies; manage synonyms, disambiguation, and canonical IDs.

  • Implement edge-level confidence scoring (source quality, claim type, co-occurrence, citations, model certainty) with full evidence provenance.

  • Build graph storage (property graph or RDF) and queryable APIs.

  • Deliver interactive visualization (UI or notebook) with filters, context toggles, and evidence drill-down.

  • Define metrics, run error analyses, and validate with scientific stakeholders.

  • Ensure reproducibility and documentation: version models/data; record architecture, assumptions, benchmarks; provide user guides.

  • Present outcomes to data science, oncology, and translational medicine teams.

Position Requirements:

  • Master’s and PhD students studying Biology, Computer Science, Chemistry, Physics, Engineering, Biomedical Science, Pharmacology, Data Science, Bioinformatics, or a related discipline.

  • Candidates must have an expected graduation date after August 2026.

  • US Work Authorization is required at time of application.

  • This role will not be providing OPT support.

  • NLP and ML: NER, relation extraction, transformers; Python-based workflows.

  • Graph/data modeling: experience with Neo4j, NetworkX, or RDF/SPARQL.

  • Domain knowledge: genes, pathways, biomarkers, therapeutic modalities (incl. ADCs) preferred.

  • Reproducibility: version control, environment management, documentation.

  • Soft skills: problem-solving, communication, collaboration.

  • Tech stack: Python (spaCy, Hugging Face), scikit-learn; PyTorch or TensorFlow.

  • Data & viz: pandas; PySpark or Dask; Plotly/Dash, D3.js, Neo4j Bloom.

  • Dev practices: Git, Conda/Poetry, Docker, experiment tracking.

  • Ability to report onsite to Waltham, MA site 3-5 days per week.

  • This role will not provide relocation assistance.

  • Compensation range: $41-$48 per hour

Details

Location
Waltham, MA
Term
Summer 2026
Posted
1/29/2026
Expires
2/12/2026

Other Internships at AstraZeneca

See All →

Machine Learning Research Intern - Summer 2026 🎓

AstraZeneca

Mississauga, ON, CanadaSummer 2026
View internship details

Laboratory Automation Data Engineer Intern

AstraZeneca

Gaithersburg, MDSummer 2026
View internship details

Automation Technician Intern - Waltham - MA

AstraZeneca

Waltham, MASummer 2026
View internship details

Quantitative Sciences & Statistical Programming Intern - Graduate 🎓 🛂

AstraZeneca

Boston, MASummer 2026
View internship details