How It's BuiltMarch 22, 20269 min read

State Licensing Board Data: How Healthcare Databases Are Actually Built

The primary source for any legitimate healthcare contact list is state licensing data. Here's how it works, why it's hard, and what it means for data quality.

When a vendor sells you a healthcare contact list, the pitch focuses on size, recency, and deliverability. What they rarely explain is where the data actually came from. The sourcing methodology — more than any other factor — determines the quality of what you're buying.

This is how legitimate healthcare databases are built.

What State Licensing Boards Actually Publish

Every state in the US requires healthcare practitioners to maintain a valid state license. Licensing boards for medicine, chiropractic, nursing, physical therapy, dental, and other professions maintain public registries of license holders. In most states, these registries are publicly accessible.

What they typically include:

  • Licensee full name
  • License number and status — active, inactive, suspended
  • License issue and expiration dates
  • Business name, where provided
  • Practice address — sometimes phone and website, often not

What they don't include:

  • Email addresses — almost never published
  • Revenue or practice size
  • Employment history or hospital affiliations

Licensing board data gives you the verified foundation: confirmed practitioners with confirmed active licenses and, in many cases, confirmed practice addresses. This is meaningfully different from a business directory that anyone can submit to without verification.

The NPI Registry

The National Provider Identifier (NPI) registry is a federal database maintained by CMS. Every healthcare provider who bills Medicare or Medicaid is required to have one. For practical purposes, this covers the vast majority of practicing clinicians.

The NPI registry is publicly accessible and includes:

  • Provider name and NPI number
  • Provider type and taxonomy codes — specialty classification
  • Primary practice address and phone
  • Enumeration date — when the NPI was issued

The NPI registry is particularly valuable because it's nationally consistent — state boards are 50 separate systems with 50 different formats and data quality levels. The taxonomy codes are a reliable proxy for specialty, and organization-level NPIs identify practice entities separately from individual providers.

Limitations: the NPI registry has no email addresses, address accuracy depends on providers keeping their registration updated, and it only covers billable providers — excluding categories like wellness practitioners and health coaches.

The Enrichment Step

Starting from licensing board or NPI data, you have names, addresses, license numbers, and taxonomy codes. No email addresses. Incomplete phone data. This is where enrichment comes in.

Enrichment is the process of finding additional contact information for records that don't have it. In the context of healthcare B2B data, this typically means:

  • Finding the practice website from the business name and address
  • Extracting contact information from the practice website
  • Finding associated email addresses from contact pages and public listings
  • Verifying phone numbers against business directory listings

The quality of enrichment determines the email hit rate. A well-enriched database in a practice-heavy category like chiropractic or MedSpa might achieve 50–65% email coverage. Categories where practices maintain weaker web presences will produce lower coverage — and that's honest, not a failure.

What good enrichment doesn't do: guess at email addresses based on assumed domain patterns — these fail at high rates and inflate deliverability numbers. Legitimate enrichment finds addresses that are actually published, not predicted.

Why This Beats Directory Scraping

The alternative to building from licensing data is scraping public business directories directly. This is faster, cheaper, and produces worse data for several structural reasons:

  • Self-reported, not verified.Anyone can list a business on most directories. Category tags are whatever the owner entered — “medical spa” could mean anything.
  • Age and accuracy.Directories update when owners update them. Most don't. Addresses and phone numbers drift out of date.
  • No licensing cross-reference.A directory listing doesn't tell you whether the practice has an active license or the owner has changed.
  • Category pollution.The informal categories used by most directories are broad enough that “chiropractic” might return massage therapists and physical therapists alongside DCs.

Licensing-first databases start with verified, categorized practitioners and then enrich to find contact information. Directory-first databases start with self-reported listings and hope the categories are accurate.

Limitations and Honest Caveats

Building from primary sources produces better data. It doesn't produce perfect data.

  • Licensing data is address-verified, not location-verified.A practitioner's license address is where they filed paperwork. Their practice location may be different.
  • Email enrichment is probabilistic.Finding a practice website and extracting an email doesn't verify that the address is monitored or goes to the right person.
  • Coverage is uneven by geography. States with better-maintained licensing databases produce better underlying data. States with incomplete online portals result in gaps.
  • The data ages. A list built six months ago has already lost accuracy as practices open, close, move, and change ownership. Any vendor claiming perpetually fresh data is not being honest.

How CRK Dev Builds

CRK Dev builds from public business directories and publicly accessible web sources, with standard enrichment for phone, website, and email. The methodology is transparent: publicly sourced, enriched where the data supports it, honest about coverage limitations.

Every dataset in the catalog includes email where found during enrichment — not guaranteed for every record. The field is present in the schema for every record and populated when the publicly available data supports it.

View the full dataset catalog — methodology and field coverage documented for every product.

View datasets →
licensing datadata pipelinehealthcaremethodology