The landscape of biomedical research

This interactive visualization displays 21 million scientific papers collected in the PubMed database, maintained by the United States National Library of Medicine and encompassing all biomedical and life science fields of research.

You can scroll the narration in the left part of the screen, and interact with the visualization in the right part of the screen. Zooming in loads additional papers. Information about each individual paper appears on mouse-over, and clicking on a paper opens its PubMed page in a separate window. Search over titles is available in the upper-right corner.

Scroll down to read more!

And see our paper for more details.

Edit Code

max_points: 750_000
zoom_balance: .38
point_size: 1.2
alpha: 45
zoom_align: right
source_url: "https://static.nomic.ai/tiles/pubmed"
background_color: "#EFEFEF"
click_function: |
   window.open(`https://pubmed.ncbi.nlm.nih.gov/${datum.pmid}/`, '_blank')
background_options:
  size: 1
  mouseover: true
  opacity: 0.8
zoom:
  bbox:
    x: [-250, 250]
    y: [-250, 250]
tooltip_html: |
  return `<div style="min-width: 240px">${datum.title} <em>${datum.journal}</em> (${datum.year})</div>`
encoding:
  foreground:
    field: labels
    lambda: d => d !== 'unlabeled'
  color:
    field: labels
    range: ["lightgrey", "#B79762", "#009271", "#004D43", "#5B4534", "#E83000", "#008941", "#549E79", "black", "#6F0062", "#006FA6", "#b65141", "#A4E804", "#8FB0FF", "#6B002C", "#3B5DFF", "#1CE6FF", "#FF9408", "#BA0900", "#1B4400", "#D790FF", "#0089A3", "#4FC601", "#00FECF", "#5A0007", "#00C2A0", "#FFB500", "#BC23FF", "#7A4900", "#CC0744", "#C20078", "#0000A6", "#aeaa00", "#FF2F80", "#FF34FF", "#FF4A46", "#FF90C9", "#6508ba", "#C895C5"]
    "domain": ["unlabeled", "microbiology", "neurology", "pediatric", "pharmacology", "physiology", "chemistry", "education", "cancer", "virology", "surgery", "biochemistry", "ophthalmology", "immunology", "rehabilitation", "veterinary", "cardiology", "pathology", "psychiatry", "genetics", "dermatology", "environment", "nutrition", "radiology", "psychology", "engineering", "gynecology", "physics", "infectious", "anesthesiology", "computation", "material", "neuroscience", "nursing", "ecology", "bioinformatics", "healthcare", "ethics", "optics"]

Introduction

Over one million articles are being currently published every year in biomedicine and life sciences. The sheer amount of publications makes it difficult to track the evolution of biomedical publishing as a whole. Search engines like PubMed and Google Scholar allow to find specific papers given suitable keywords and follow the citation networks that these papers are embedded in, yet none of them allows to explore the biomedical literature ‘landscape’ from a global perspective. This makes it hard to see how research topics evolve over time, how different fields are related to each other, or how new methods and techniques are adopted in different fields.

To answer such questions, we provide a bird’s-eye view on the biomedical literature.

Here we offer an approach that enables all of the above: a global two-dimensional atlas of the biomedical and life science literature which is based on the abstracts of all 21 million English language articles contained in the PubMed database. To create the map, we embedded the abstracts into two dimensions using the transformer-based large language model PubMedBERT combined with the neighbor embedding method t-SNE.

Our map is based on the abstract texts alone, and did not use any further metadata or information on citations or references.

This visualization facilitates exploration of the biomedical literature and can reveal aspects of the data that would not be easily noticed with other analysis methods. We showcase the power of our approach in four examples:

The emergence of the Covid-19 literature.
The evolution of different subfields of neuroscience
The uptake of machine learning (upcoming; see paper)
The distribution of gender imbalance across biomedical fields.

The shared strategy in all of these is to formulate specific hypotheses about the data based on the visual exploration, and then to confirm them by a dedicated statistical analysis of the original high-dimensional dataset.

We labeled the dataset by selecting 38 keywords contained in journal titles that reflected the general topic of the paper. We based our choice of keywords on lists of medical specialties and life science branches that appeared frequently in the journal titles in our dataset.

Papers were assigned a label if their journal title contained that term. As a result, about a third of the papers in the dataset received labels.

The labels demonstrate that our map has sensible global organization: psychology papers are next to psychiatry papers, optics is next to physics, and so on. Overall, the left part of the map corresponds to life sciences, while the right part corresponds to medicine.

Edit Code

encoding:
  x: 
    field: x
  y: 
    field: y
  foreground:
    field: labels
    lambda: d => d !== 'unlabeled'
  color: 
      field: labels
      range: ["lightgrey", "#B79762", "#009271", "#004D43", "#5B4534", "#E83000", "#008941", "#549E79", "black", "#6F0062", "#006FA6", "#b65141", "#A4E804", "#8FB0FF", "#6B002C", "#3B5DFF", "#1CE6FF", "#FF9408", "#BA0900", "#1B4400", "#D790FF", "#0089A3", "#4FC601", "#00FECF", "#5A0007", "#00C2A0", "#FFB500", "#BC23FF", "#7A4900", "#CC0744", "#C20078", "#0000A6", "#aeaa00", "#FF2F80", "#FF34FF", "#FF4A46", "#FF90C9", "#6508ba", "#C895C5"]
      "domain": ["unlabeled", "microbiology", "neurology", "pediatric", "pharmacology", "physiology", "chemistry", "education", "cancer", "virology", "surgery", "biochemistry", "ophthalmology", "immunology", "rehabilitation", "veterinary", "cardiology", "pathology", "psychiatry", "genetics", "dermatology", "environment", "nutrition", "radiology", "psychology", "engineering", "gynecology", "physics", "infectious", "anesthesiology", "computation", "material", "neuroscience", "nursing", "ecology", "bioinformatics", "healthcare", "ethics", "optics"]
duration: 1000
background_options:
  opacity: [.2, 1]
  size: [.5, 1]
labels:
  url: "https://static.nomic.ai/tiles/pubmed/labels.geojson"
  name: labels
  label_field: labels
  size_field: undefined

anesthesiology

biochemistry

bioinformatics

cancer

cardiology

chemistry

computation

dermatology

ecology

education

engineering

environment

ethics

genetics

gynecology

healthcare

immunology

infectious

material

microbiology

neurology

neuroscience

nursing

nutrition

ophthalmology

optics

pathology

pediatric

pharmacology

physics

physiology

psychiatry

psychology

radiology

rehabilitation

surgery

veterinary

virology

unlabeled

Clear

Edit Code

point_size: 1.3
background_options:
  opacity: [.05, 2]
  size: [.1, 2]
duration: 450
alpha: 100
encoding:
  filter: null
  foreground: null
  color:
    field: human
    domain: [0, 1]
    range: viridis

The global structure is well captured by categories assigned based on subject headings by the iCite project. These measures look at all MeSH headings and classify each article by the share that is related to humans, to molecular biology, or to animal studies. The right half of the chart is human medicine, while the left half is split between animal and biochemical studies.

molecular_cellular

animal

human

While we use journal titles to assign labels, the actual data underlying this representation are abstract texts. Here we color the map by length of each abstract (darker color: shorter abstracts; lighter color: longer abstracts). This, too, shows regional patterns, with some disciplines preferring longer abstracts than others.

Edit Code

background_options:
  opacity: [.2, 1]
  size: [.5, 1]
point_size: 1.2
labels: null
duration: 3000
encoding:
  filter: null
  x:
    field: x
    transform: literal
  y: 
    field: y
    transform: literal
  color:
    field: abstract_length
    domain: [0, 500]
    range: magma
  foreground: null

Abstract lengths do not obey a smooth distribution: instead, they cluster at 150, 200, and 250 words, likely because authors are constrained by journals’ submission guidelines.

Edit Code

point_size: 1.2
duration: 3000
encoding:
  x: 
    field: abstract_length.x
    transform: literal
  y: 
    field: abstract_length.y
    transform: literal
  foreground: null
  color:
    field: abstract_length
    domain: [0, 500]
    range: magma
labels:
  name: abstract lengths
  labels:
    - {text: '200 words: 170,806 abstracts', x: -47.5837670871576, y: -154.5403565148235}
    - {text: '150 words', x: -122.98567376541476, y: -118.22208002322414}
    - {text: '250 words', x: 1.1658502332687704, y: -83.1412289407424}
    - {text: '100 words: 75,265 abstracts', x: -145.88492702060688, y: 10.953296382699634}

The majority of the displayed papers were published between 1970 and 2021. Here darker colors correspond to earlier publication years and lighter colors correspond to more recent papers.

Edit Code

encoding:
  x: 
    field: time.x
    transform: literal
  y: 
    field: time.y
    transform: literal
  foreground: null
  color:
    field: year
    domain: [1970, 2022]
    range: viridis
labels:
  name: publication years
  labels:
    - {text: '2020: 1.41m', x: 241.5698747443659, y: -158.4085756806052}   
    - {text: '1975: 82k', x: -241.5698747443659, y: 100.4085756806052}

Our map, however, is not predominantly organized by time. Most regions contain articles from multiple different eras in fairly close proximity.

Edit Code

labels: null
encoding:
  foreground: null
  x: 
    field: x
    transform: literal
  y: 
    field: y
    transform: literal

But when zooming in closer, temporal periods often become well segregated. In most individual fields, the temporal division is very strong: for example, here we see that science progresses within immunology and virology in such a way that recent articles have abstracts much more similar to each other than to articles from the 1970s and 1980s in the same fields.

Edit Code

point_size: 1.2
alpha: 40

zoom:
  bbox: {"x":[-114.44657259301789,-35.94570776773757],"y":[-58.83688556079831,9.067210254589938]}
encoding:
  filter: null
  x: 
    field: x
    transform: literal
  y:
    field: y
    transform: literal
  color:
    field: year
    domain: [1975, 2025]
    range: viridis
  jitter_radius: null

COVID-19

Strikingly, one area of the map contains only papers from 2020–21. These are papers on Covid-19.

We considered a paper Covid-related if it contained phrases like “Covid-19” or “SARS-CoV-2” in the abstract text. Our dataset includes 132 thousand Covid-related papers, most of which are concentrated in this area.

See our paper for direct evidence that Covid literature formed an unprecedentally tight research cluster.

Edit Code

duration: 4000
alpha: 80
encoding:
  filter: null
  x:
    field: x
    transform: literal
  y:
    field: y
    transform: literal
  color:
    field: year
    domain: [1975, 2022]
    range: viridis
zoom:
  bbox: {"x":[-26.113317446654943,16.705487505186465],"y":[47.62328228197863,84.66201097142243]}

We can group the Covid papers based on the presence of specific keywords in their title. All different kinds of Covid-related research appear in this cluster in microcosm, from treatment and epidemiology at the top, to social and family-related issues at the bottom.

Vaccines appear as two major regions which are completely distinct: one involving the scientific effort to create and test vaccines, and the other (towards the bottom) involving the public health effort to get people to use the vaccines once they were widely available.

Edit Code

point_size: 1.2
labels:
  url: "https://static.nomic.ai/tiles/pubmed/covid_label.geojson"
  name: covid_label
  label_field: covid_label
  size_field: undefined
encoding:
  filter:
    field: covid_label
    lambda: d => d !== ''
  foreground:
    field: covid_label
    lambda: |
      d => d !== 'Covid unlabeled'
  color:
    field: covid_label
    domain: ["Covid unlabeled", "Antibody", "Anxiety", "Cancer", "Children", "Clinical", "Epidemic", "Healthcare", "Immune", "Implications", "Mental", "Mortality", "Outbreak", "Pediatric", "Pneumonia", "Population", "Psychological", "Respiratory", "Social", "Strategies", "Students","Surgery", "Symptoms", "Therapy", "Transmission", "Treatment", "Vaccine", "Workers"]
    range: ["lightgrey", "#B79762", "#009271", "#004D43", "#5B4534", "#E83000", "#008941", "#549E79", "black", "#6F0062", "#006FA6", "#b65141", "#A4E804", "#8FB0FF", "#6B002C", "#3B5DFF", "#1CE6FF", "#FF9408", "#BA0900", "#1B4400", "#D790FF", "#0089A3", "#4FC601", "#00FECF", "#5A0007", "#00C2A0", "#FFB500", "#BC23FF", "#7A4900", "#CC0744", "#C20078", "#0000A6", "#aeaa00", "#FF2F80", "#FF34FF", "#FF4A46", "#FF90C9", "#6508ba", "#C895C5"]

Antibody

Anxiety

Cancer

Children

Clinical

Epidemic

Healthcare

Immune

Implications

Mental

Mortality

Outbreak

Pediatric

Pneumonia

Population

Psychological

Respiratory

Social

Strategies

Students

Surgery

Symptoms

Therapy

Transmission

Treatment

Vaccine

Workers

clear

We can also see how the focus of Covid publications shifted with time during 2020–2021. Early papers are predominantly clinical, while research on societal implications and vaccine hesitancy appeared later.

Edit Code

duration: 2000
point_size: 1.4
zoom:
  bbox: {"x":[-26.113317446654943,16.705487505186465],"y":[47.62328228197863,84.66201097142243]}
encoding:
  filter: 
      field: covid_label
      lambda: |
        d => d !== ''
  filter2:
    field: date
    op: between
    a: 1572566400000
    b: 1682566400000
  foreground: null
  color:
    field: date
    domain: [1572566400000, 1654041600000]
    range: viridis
duration: 2000

date: 2020-06-23 – 2021-10-08

Neuroscience

Neuroscience papers congeal into two large regions of the map: one in the upper part, and one in the lower part.

Edit Code

max_points: 1000000
zoom_balance: .38
point_size: 2
labels: null
alpha: 475
duration: 2000
zoom:
  bbox:
    x: [-250, 250]
    y: [-250, 250]
encoding:
  foreground: null
  filter2: null
  filter:
    field: labels
    lambda: |
      d => d == 'neuroscience'
  color:
    field: labels
    range: ["lightgrey", "#B79762", "#009271", "#004D43", "#5B4534", "#E83000", "#008941", "#549E79", "black", "#6F0062", "#006FA6", "#b65141", "#A4E804", "#8FB0FF", "#6B002C", "#3B5DFF", "#1CE6FF", "#FF9408", "#BA0900", "#1B4400", "#D790FF", "#0089A3", "#4FC601", "#00FECF", "#5A0007", "#00C2A0", "#FFB500", "#BC23FF", "#7A4900", "#CC0744", "#C20078", "#0000A6", "#aeaa00", "#FF2F80", "#FF34FF", "#FF4A46", "#FF90C9", "#6508ba", "#C895C5"]
    "domain": ["unlabeled", "microbiology", "neurology", "pediatric", "pharmacology", "physiology", "chemistry", "education", "cancer", "virology", "surgery", "biochemistry", "ophthalmology", "immunology", "rehabilitation", "veterinary", "cardiology", "pathology", "psychiatry", "genetics", "dermatology", "environment", "nutrition", "radiology", "psychology", "engineering", "gynecology", "physics", "infectious", "anesthesiology", "computation", "material", "neuroscience", "nursing", "ecology", "bioinformatics", "healthcare", "ethics", "optics"]

Coloring neuroscience papers by some of the prominent terms appearing in their titles, we see that the upper part encompasses research on cellular and molecular neuroscience, whereas the lower part contains literature on behavioural and computational neuroscience.

Edit Code

duration: 2000
labels:
  url: "https://static.nomic.ai/tiles/pubmed/neuroscience_label.geojson"
  name: neuroscience_label
  label_field: neuroscience_label
  size_field: undefined
encoding:
  foreground:
    field: neuroscience_label
    lambda: |
      d => d !== 'unlabeled' && d !== '' && d !== "Neuroscience unlabeled"
  color:
    field: neuroscience_label
    range: dark2

Coloring papers by publication year suggests that neuroscience originated as a study of cellular and molecular mechanisms, and later broadened to include behavioural and computational research.

See direct quantifications of this effect in our paper.

Edit Code

encoding:
  filter2:
    field: year
    op: between
    a: 1970
    b: 2021
  color:
    field: year
    domain: [1970, 2021]
    range: viridis

year: 1982 – 2008

Gender bias

Using the first name of the first author of each paper, we could infer their gender. Coloring the map with the inferred gender, we can see which research fields have more male or female authors.

Edit Code

duration: 1000
point_size: 1.4
labels: null
zoom:
  bbox:
    x: [-250, 250]
    y: [-250, 250]
encoding:
  filter: null
  filter2: null
  foreground: null
  color:
    field: GenderFirstAuthor
    domain: ['unknown', 'male', 'female']
    range: ["#f5f5f5", "#1f77b4", "#ff7f0e", ]

Women

Men

Both

Some areas are dominated by either female or male first authors. Here are some examples:

female-dominated area -- contraceptive use

male-dominated area -- shoulder arthoscopy

In some individual disciplines we saw substantial heterogeneity of gender ratios. For example, there were male- and female-dominated regions in the map of healthcare papers. One of the more male-dominated clusters focused on financial management while one of the more female ones – on patient care.

Edit Code

point_size: 1.4
alpha: 100
zoom:
  bbox: {"x":[45.0, 95.0],"y":[170.0, 210.0]}
encoding:
  filter: 
    field: labels
    lambda: d => d == 'healthcare'
  foreground:
    field: GenderFirstAuthor
    lambda: "d => d !== 'unknown'"
  x: 
    field: x
    transform: literal
  y:
    field: y
    transform: literal
  color:
    field: GenderFirstAuthor
    domain: ['male', 'female', 'unknown']
    range: ["#1f77b4", "#ff7f0e", "#f5f5f5"]
  jitter_radius: null

labels:
  name: healthcare
  labels: 
    - text: finances
      x: 71.05457651874438
      y: 198.39371158959952
    - text: patient care
      x: 68.47492000400109
      y: 179.31588838556968

In education, female authors dominated research on nursing training; male authors were more frequent in research on medical training.

Edit Code

point_size: 1.4
alpha: 100
zoom:
  bbox: { "x": [65.0, 135.0], "y":[150.0, 200.0] }
encoding:
  filter: 
    field: labels
    lambda: d => d == 'education'
  foreground:
    field: GenderFirstAuthor
    lambda: "d => d !== 'unknown'"
  x: 
    field: x
    transform: literal
  y:
    field: y
    transform: literal
  jitter_radius: null
labels:
  name: education
  labels: 
    - text: nurse education
      x: 91.05457651874438
      y: 164
    - text: doctor education
      x: 119
      y: 169

In surgery, only 24% of the first authors were female, but this fraction increased to 61% in the cluster of papers on veterinary surgery.

Edit Code

point_size: 1.4
alpha: 100
zoom:
  bbox: {"x":[137,173],"y":[-87,-59]}
encoding:
  filter: 
    field: labels
    lambda: d => d == 'surgery'
  foreground:
    field: GenderFirstAuthor
    lambda: "d => d !== 'unknown'"
  x: 
    field: x
    transform: literal
  y:
    field: y
    transform: literal
  color:
    field: GenderFirstAuthor
    domain: ['male', 'female', 'unknown']
    range: ["#1f77b4", "#ff7f0e", "#f5f5f5"]
  jitter_radius: null
labels:
  name: education
  labels: 
    - text: veterininary surgery
      x: 153
      y: -72
    - text: heart surgery
      x: 163
      y: -76

Retractions

Text similarity metrics like these offer potentially useful methods for identifying large-scale patterns. Several specific areas, in particular on top of the map, covering research on cancer-related drugs, marker genes, and microRNA. These areas are known targets of paper mills, which are for-profit organizations that produce fraudulent research papers and sell the authorship.

Edit Code

background_options:
  size: [.5, 3]
  mouseover: true
  opacity: [0.5, 1e10]
encoding:
  color:
    field: retracted
    domain: [.5, 1.2]
    range: 'magma'
  filter: null
  foreground:
    field: retracted
    op: gt
    a: 0
zoom:
  bbox:
    x: [-250, 250]
    y: [-300, 300]

In the paper we investigate this region with particularly high fraction (48/443) of retracted papers. Most other papers in this area have similar title format (variations of “MicroRNA-X does Y by targeting Z in osteosarcoma”), paper structure, and figure style, and 24/25 of them had authors affiliated with Chinese hospitals (some of which provide promotions or pay increases for publications without providing substantial laboratory support).

Regions like this merit closer attention.

Edit Code

zoom:
  bbox:
    x: [-40.37718386886921,-39.037601592113646]
    y: [-215.68617863463385,-214.18734531798427]

Linking data

Pubmed IDs are universal identifiers that allow for various other integrations with our map. Right now, for example, we display citation counts for each paper.

Edit Code


background_options:
  size: [.8, 1]
  mouseover: true
  opacity: .8
encoding: 
  filter: null
  filter2: null
  foreground: null
  color:
    field: citation_count
    range: viridis
    domain: [0.1, 100]
    transform: log

Search using PubMed APIs

If you have a specific list of pubmed ids separated by commas, spaces, or newlines (or any combination) you can enter them into the searchbox below to highlight them on the map. Note that you may need to zoom in to see all the points.

37205355, 37051467, 36993635, 36993532, 36712140, 36711462, 36660179, 36610997, 36610989, 36494337, 36094956, 35607693, 35440782, 35273392, 35143775, 35132262, 34826233, 34616073, 34616066, 34417615, 34211145, 34103073, 33795888, 33676365, 33659468, 33303831, 33247933, 32942485, 32909008, 32701060, 32484809, 32176273, 32070398, 31873215, 31626771, 31455877, 31162708, 31073610, 30682053, 30664774, 30654736, 30566439, 30290149, 29967537, 29897334, 29650040, 28581496, 28545501, 28501650, 28448519, 27905880, 27875323, 27824834, 27621057, 27504780, 27230763, 27043002, 26683605, 26541607, 26531823, 26174866, 26061751, 26024968, 25765649, 25332375, 24936470, 24824901, 24558263, 24442673, 24314033, 23788555, 23677943, 23360652, 23222703, 23160280, 22815359, 22398619, 22383036, 22113004, 22099972, 21908772, 21728862, 21697122, 21642536, 21642531, 21410973, 21226895, 23089814, 21042592, 20856582, 20718980, 20686598, 20605923, 20436464, 20351773, 20052417, 19478997, 19289445, 18796475, 18779558, 18682743, 18447942, 18437230, 18353788, 17994088, 17994087, 17988176, 17874271, 17764440, 17571346, 17567995, 17538628, 17499477, 17433106, 17237099, 16789815, 16651369, 16600017, 16458514, 16354754, 16280538, 16110337, 16108706, 15911755, 15545499, 15534224, 15534223, 15215394, 15060015, 15060012, 15060007, 15059998, 15057822, 14988105, 14534169, 12935341, 12824358, 12824355, 12618381, 12610304, 12529311, 12529308, 12466850, 12015888, 12002220, 11997350, 11751581

Search PubMed Entrez field:

Lior Pachter

Eric Lander

Anthony Fauci

Elizabeth Blackburn

Craig Venter

Sydney Brenner

Francis Collins

Solomon H. Snyder

Randy Schekman

Emmanuelle Charpentier

Jennifer Doudna

Edit Code

duration: 5000
point_size: 1.2
alpha: 45
labels: null
encoding:
  filter: null
  filter2: null
  x:
    field: x
    transform: literal
  y:
    field: y
    transform: literal
  foreground: null
  color:
    field: year
    range: viridis
    domain: [1975, 2023]

zoom:
  bbox:
    x: [-250, 250]
    y: [-250, 250]

Alter point sizes for selected and unselected points.

alter point size for selected and unselected points.: 1 – 5

BERT model vs. TF-IDF

We also produced a two-dimensional map based on the bag-of-words representation of PubMed abstracts (known in the natural language processing literature as TF-IDF) instead of the PubMedBERT model. This resulted in worse separation between our journal-based labels, so used the PubMedBERT approach for all the visualizations above. Please see the paper for more detailed comparison.

Here you can switch between the PubMedBERT-based and the TF-IDF-based maps.

Edit Code

duration: 5000
point_size: 1.2
alpha: 45
labels: null
encoding:
  filter: null
  filter2: null
  x:
    field: x
    transform: literal
  y:
    field: y
    transform: literal
  foreground:
    field: labels
    lambda: d => d !== 'unlabeled'
  color:
    field: labels
    range: ["lightgrey", "#B79762", "#009271", "#004D43", "#5B4534", "#E83000", "#008941", "#549E79", "black", "#6F0062", "#006FA6", "#b65141", "#A4E804", "#8FB0FF", "#6B002C", "#3B5DFF", "#1CE6FF", "#FF9408", "#BA0900", "#1B4400", "#D790FF", "#0089A3", "#4FC601", "#00FECF", "#5A0007", "#00C2A0", "#FFB500", "#BC23FF", "#7A4900", "#CC0744", "#C20078", "#0000A6", "#aeaa00", "#FF2F80", "#FF34FF", "#FF4A46", "#FF90C9", "#6508ba", "#C895C5"]
    "domain": ["unlabeled", "microbiology", "neurology", "pediatric", "pharmacology", "physiology", "chemistry", "education", "cancer", "virology", "surgery", "biochemistry", "ophthalmology", "immunology", "rehabilitation", "veterinary", "cardiology", "pathology", "psychiatry", "genetics", "dermatology", "environment", "nutrition", "radiology", "psychology", "engineering", "gynecology", "physics", "infectious", "anesthesiology", "computation", "material", "neuroscience", "nursing", "ecology", "bioinformatics", "healthcare", "ethics", "optics"]
zoom:
  bbox:
    x: [-250, 250]
    y: [-250, 250]

TF-IDF

PubMedBERT

Edit Code

duration: 5000
point_size: 1.2
alpha: 45
labels: null
encoding:
  filter: null
  filter2: null
  x:
    field: tfidf.x
    transform: literal
  y:
    field: tfidf.y
    transform: literal
  foreground:
    field: labels
    lambda: d => d !== 'unlabeled'
  color:
    field: labels
    range: ["lightgrey", "#B79762", "#009271", "#004D43", "#5B4534", "#E83000", "#008941", "#549E79", "black", "#6F0062", "#006FA6", "#b65141", "#A4E804", "#8FB0FF", "#6B002C", "#3B5DFF", "#1CE6FF", "#FF9408", "#BA0900", "#1B4400", "#D790FF", "#0089A3", "#4FC601", "#00FECF", "#5A0007", "#00C2A0", "#FFB500", "#BC23FF", "#7A4900", "#CC0744", "#C20078", "#0000A6", "#aeaa00", "#FF2F80", "#FF34FF", "#FF4A46", "#FF90C9", "#6508ba", "#C895C5"]
    "domain": ["unlabeled", "microbiology", "neurology", "pediatric", "pharmacology", "physiology", "chemistry", "education", "cancer", "virology", "surgery", "biochemistry", "ophthalmology", "immunology", "rehabilitation", "veterinary", "cardiology", "pathology", "psychiatry", "genetics", "dermatology", "environment", "nutrition", "radiology", "psychology", "engineering", "gynecology", "physics", "infectious", "anesthesiology", "computation", "material", "neuroscience", "nursing", "ecology", "bioinformatics", "healthcare", "ethics", "optics"]
zoom:
  bbox:
    x: [-250, 250]
    y: [-250, 250]

TF-IDF

PubMedBERT