Export a HubSpot Contact's Email History to a Markdown File

When you’re trying to understand what happened with a customer, the HubSpot timeline is great for reading—but frustrating for extracting. If they are all from your own email address, sure you can check out Gmail or Outlook, but if there are conversations that you are reviewing from another HubSpot user or multiple HubSpot users, it's nearly impossible.

HubSpot Breeze is great, but as of this writing, I have found this method to be a more effective means of analysis and review for many use cases. Similarly, HubSpot's ChatGPT connector is great, but it just has access to metadata not the fully body of emails, so it falls short for this purpose.

Maybe you want to:

find something specific among a large body of emails
hand a clean comms history to leadership
do a post-mortem on a deal or onboarding
transfer context between team members
summarize the thread using AI tools
preserve a record for compliance or documentation

The problem: HubSpot timelines are dynamically loaded, so "save as" and "print to PDF" would produce just a fraction of the email content, highlighting and copying in notepad, often produces “Loading Timeline Activity…” instead of the actual emails, extensions like GoFullPage can't pick it up either.

This guide shows a repeatable, reliable way to export a contact’s logged 1:1 emails into a single chronological Markdown file using a HubSpot Private App token and a small Python script. With this file, you can drop it into your favorite LLM, and get great answers.

What you’ll end up with

A single file like contact_emails.md
Emails in chronological order
Each entry includes timestamp, subject, and email body
Easy to paste into Notepad, share internally, or feed into analysis tools

What you’ll need

Access and permissions

You must have access to the HubSpot portal
You must be allowed to create a Private App, or have someone do it for you

Tools on your computer

Windows (steps below assume Windows, but this is possible in other Operating Systems)
Python installed (3.x)
Command Prompt (built-in)
Internet access from your machine (to call HubSpot APIs)

Step 1: Create a HubSpot Private App (HubSpot web app)

Log into HubSpot
Go to:
- Settings (gear icon)
- Integrations
- Private Apps (Legacy Apps)
Click Create private app
Give it a name like: Email Export Script
Go to the Scopes section

What scopes to enable

HubSpot permissions vary by portal, subscription, and how the email was logged. To avoid guessing “the one perfect scope,” the practical approach is:

Start with the minimum expected read scopes
If you get a “missing scopes” error, add the next scope and retry

Enable the following read scopes (names can vary by account):

Start here (most common)

crm.objects.contacts.read (read the contact + associations)

Commonly required for 1:1 sales email content

sales-email-read (often required to read 1:1 email engagement content)

If your portal exposes CRM Email object scopes

crm.objects.emails.read

If you later decide to export from Deals/Companies too

crm.objects.deals.read
crm.objects.companies.read

Optional (only if you want attachments referenced by emails)

File read scope(s) (exact naming varies by HubSpot account)

Your goal is read-only access. Enable the fewest scopes needed, test, and expand only as required.

Click Create app
Copy the Access token (it starts with pat-)

Keep this token private. Treat it like a password.

Step 2: Create a working folder (Windows File Explorer)

Create a folder anywhere, for example: C:\HubSpot Email Export\
You’ll put one Python file in this folder: export_hubspot_contact_emails_to_md.py

Step 3: Create the Python script file (Notepad or VS Code)

Open Notepad (or VS Code)
Paste the full script below
Save the file as: export_hubspot_contact_emails_to_md.py

Important: make sure Notepad doesn’t append .txt. In the “Save as type” dropdown, choose All Files, and name it exactly with .py.

The script (copy/paste as-is)

#!/usr/bin/env python3
"""
Export all HubSpot CRM "Email" objects associated to a Contact into a single
chronological Markdown file.

This uses:
- CRM v4 associations: Contact -> Emails
- CRM v3 emails batch read: to fetch subject/body/timestamps

Usage (Windows CMD):
  set HUBSPOT_TOKEN=pat-xxxxx
  python export_hubspot_contact_emails_to_md.py --contact-id 123 --out contact_emails.md
"""

from __future__ import annotations

import argparse
import os
import re
import sys
import time
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional

import requests

try:
    from bs4 import BeautifulSoup  # pip install beautifulsoup4
except Exception:
    BeautifulSoup = None


BASE_URL = "https://api.hubapi.com"
DEFAULT_TIMEOUT = 30
MAX_RETRIES = 7


@dataclass
class EmailRecord:
    email_id: str
    ts_ms: int
    ts_iso: str
    subject: str
    from_: str
    to: str
    cc: str
    direction: str
    status: str
    body: str


def _get_env_token() -> str:
    token = os.environ.get("HUBSPOT_TOKEN", "").strip()
    if not token:
        raise RuntimeError("HUBSPOT_TOKEN environment variable is not set.")
    return token


def hubspot_request(method: str, url: str, token: str, **kwargs) -> requests.Response:
    """Request wrapper with retry/backoff for rate limits and transient failures."""
    headers = kwargs.pop("headers", {}) or {}
    headers["Authorization"] = f"Bearer {token}"
    headers.setdefault("Content-Type", "application/json")
    kwargs["headers"] = headers
    kwargs.setdefault("timeout", DEFAULT_TIMEOUT)

    last_resp: Optional[requests.Response] = None
    for attempt in range(MAX_RETRIES):
        resp = requests.request(method, url, **kwargs)
        last_resp = resp

        if resp.status_code in (429, 500, 502, 503, 504):
            retry_after = resp.headers.get("Retry-After")
            if retry_after and retry_after.isdigit():
                sleep_s = int(retry_after)
            else:
                sleep_s = min(2 ** attempt, 20)
            time.sleep(sleep_s)
            continue

        return resp

    return last_resp  # type: ignore[return-value]


def normalize_text(s: str) -> str:
    s = (s or "").replace("\r\n", "\n").replace("\r", "\n")
    s = re.sub(r"[ \t]+", " ", s)
    s = re.sub(r"\n{3,}", "\n\n", s)
    return s.strip()


def html_to_text(html: str) -> str:
    html = html or ""
    if not html.strip():
        return ""

    if BeautifulSoup is not None:
        soup = BeautifulSoup(html, "html.parser")
        for tag in soup(["script", "style"]):
            tag.decompose()
        text = soup.get_text("\n")
        return normalize_text(text)

    # Fallback if bs4 isn't installed
    text = re.sub(r"(?i)<br\s*/?>", "\n", html)
    text = re.sub(r"(?i)</p\s*>", "\n\n", text)
    text = re.sub(r"<[^>]+>", "", text)
    return normalize_text(text)


def pick_timestamp_ms(props: Dict[str, Any]) -> int:
    """Prefer hs_timestamp, fall back to created/modified date variants."""
    for key in ("hs_timestamp", "hs_createdate", "createdate", "hs_lastmodifieddate"):
        v = props.get(key)
        if v is None:
            continue
        try:
            return int(v)
        except Exception:
            pass
    return 0


def ms_to_iso(ms: int) -> str:
    if not ms:
        return "Unknown time"
    dt = datetime.fromtimestamp(ms / 1000, tz=timezone.utc)
    return dt.isoformat()


def get_associated_email_ids(contact_id: str, token: str) -> List[str]:
    """List Email object IDs associated to a Contact via CRM v4 associations."""
    email_ids: List[str] = []
    after: Optional[str] = None

    while True:
        url = f"{BASE_URL}/crm/v4/objects/contacts/{contact_id}/associations/emails"
        params: Dict[str, Any] = {"limit": 500}
        if after:
            params["after"] = after

        resp = hubspot_request("GET", url, token, params=params)
        if resp.status_code != 200:
            raise RuntimeError(f"Associations GET failed ({resp.status_code}): {resp.text}")

        data = resp.json()
        for r in data.get("results", []) or []:
            to_id = r.get("toObjectId")
            if to_id is not None:
                email_ids.append(str(to_id))

        after = (data.get("paging") or {}).get("next", {}).get("after")
        if not after:
            break

    # Dedupe while preserving order
    seen = set()
    deduped: List[str] = []
    for eid in email_ids:
        if eid not in seen:
            seen.add(eid)
            deduped.append(eid)
    return deduped


def batch_read_emails(email_ids: List[str], token: str) -> List[Dict[str, Any]]:
    """Batch read Email objects via CRM v3."""
    if not email_ids:
        return []

    props = [
        "hs_timestamp",
        "hs_createdate",
        "hs_lastmodifieddate",
        "hs_email_subject",
        "hs_email_text",
        "hs_email_html",
        "hs_email_from",
        "hs_email_to",
        "hs_email_cc",
        "hs_email_bcc",
        "hs_email_direction",
        "hs_email_status",
    ]

    out: List[Dict[str, Any]] = []
    chunk_size = 100

    for i in range(0, len(email_ids), chunk_size):
        chunk = email_ids[i : i + chunk_size]
        url = f"{BASE_URL}/crm/v3/objects/emails/batch/read"
        payload = {
            "properties": props,
            "inputs": [{"id": eid} for eid in chunk],
        }

        resp = hubspot_request("POST", url, token, json=payload)
        if resp.status_code != 200:
            raise RuntimeError(f"Batch read failed ({resp.status_code}): {resp.text}")

        data = resp.json()
        out.extend(data.get("results", []) or [])

    return out


def to_email_record(email_obj: Dict[str, Any]) -> EmailRecord:
    email_id = str(email_obj.get("id", ""))
    props = email_obj.get("properties") or {}

    ts_ms = pick_timestamp_ms(props)
    ts_iso = ms_to_iso(ts_ms)

    subject = normalize_text(props.get("hs_email_subject") or "(no subject)")
    from_ = normalize_text(props.get("hs_email_from") or "")
    to = normalize_text(props.get("hs_email_to") or "")
    cc = normalize_text(props.get("hs_email_cc") or "")
    direction = normalize_text(props.get("hs_email_direction") or "")
    status = normalize_text(props.get("hs_email_status") or "")

    text_body = props.get("hs_email_text") or ""
    html_body = props.get("hs_email_html") or ""
    body = normalize_text(text_body) if str(text_body).strip() else html_to_text(str(html_body))
    if not body:
        body = "(no body captured)"

    return EmailRecord(
        email_id=email_id,
        ts_ms=ts_ms,
        ts_iso=ts_iso,
        subject=subject,
        from_=from_,
        to=to,
        cc=cc,
        direction=direction,
        status=status,
        body=body,
    )


def write_markdown(contact_id: str, records: List[EmailRecord], out_path: str) -> None:
    now_utc = datetime.now(timezone.utc).isoformat()

    lines: List[str] = []
    lines.append(f"# HubSpot emails for contact {contact_id}")
    lines.append("")
    lines.append(f"- Exported at (UTC): {now_utc}")
    lines.append(f"- Total emails: {len(records)}")
    lines.append("")

    for r in records:
        lines.append(f"## {r.ts_iso} — {r.subject}")
        if r.direction:
            lines.append(f"- Direction: {r.direction}")
        if r.status:
            lines.append(f"- Status: {r.status}")
        if r.from_:
            lines.append(f"- From: {r.from_}")
        if r.to:
            lines.append(f"- To: {r.to}")
        if r.cc:
            lines.append(f"- CC: {r.cc}")
        lines.append(f"- Email object ID: {r.email_id}")
        lines.append("")
        lines.append(r.body)
        lines.append("")

    content = "\n".join(lines).rstrip() + "\n"
    with open(out_path, "w", encoding="utf-8") as f:
        f.write(content)


def main() -> int:
    parser = argparse.ArgumentParser(description="Export HubSpot contact-associated Email objects to Markdown.")
    parser.add_argument("--contact-id", required=True, help="HubSpot Contact record ID / hs_object_id")
    parser.add_argument("--out", required=True, help="Output markdown path")
    args = parser.parse_args()

    token = _get_env_token()
    contact_id = str(args.contact_id).strip()
    out_path = args.out

    email_ids = get_associated_email_ids(contact_id, token)
    if not email_ids:
        write_markdown(contact_id, [], out_path)
        print(f"Wrote {out_path} (0 emails found via Email object associations)")
        print("If the HubSpot UI shows emails, they may be stored as timeline events rather than Email objects.")
        return 0

    email_objs = batch_read_emails(email_ids, token)
    records = [to_email_record(o) for o in email_objs]
    records.sort(key=lambda r: r.ts_ms)

    write_markdown(contact_id, records, out_path)
    print(f"Wrote {out_path} ({len(records)} emails)")
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

Step 4: Install Python dependencies (Command Prompt)

This only needs to be done once.

Open Command Prompt
Navigate to your folder (example):

cd "C:\HubSpot Email Export"

Install dependencies:

pip install requests beautifulsoup4

Step 5: Store your HubSpot token as an environment variable (Command Prompt)

In the same Command Prompt window:

set HUBSPOT_TOKEN=pat-PASTE-YOUR-TOKEN-HERE

Notes:

Do not include quotes
Do not include the word Bearer
This token will exist only for this Command Prompt session

Step 6: Find the Contact ID in HubSpot (HubSpot web app)

You need the contact’s record ID (also called the object ID / hs_object_id).

Common ways to find it:

Look at the contact record URL (the numeric ID is usually visible)
Create a contact view that includes the Record ID column

Step 7: Run the export (Command Prompt)

python export_hubspot_contact_emails_to_md.py --contact-id 118519123288 --out contact_emails.md

After it finishes, you’ll have contact_emails.md in the same folder.

Step 8: Open the file

notepad contact_emails.md

Or open it in VS Code for easier navigation/search.

Troubleshooting

“Missing scopes”

If you see an error indicating missing scopes:

Go back to Private Apps
Edit your app’s scopes
Add read scopes incrementally and retry

Suggested order:

crm.objects.contacts.read
sales-email-read
crm.objects.emails.read (if available in your portal)

“0 emails found” but HubSpot shows emails on the timeline

In some portals, timeline email activities are stored as timeline events rather than CRM Email objects associated to the contact. If that happens:

the script will still create a Markdown file
it will contain 0 emails
you’ll need an alternate export approach that targets timeline events

Token safety

Do not paste tokens into screenshots, Slack, or shared docs
If a token is exposed, revoke it and create a new one

Why this is better than copying from the timeline

The HubSpot timeline UI is virtualized and lazy-loaded
“Select all + copy” often captures placeholders instead of content
The API export gives you a consistent, complete artifact you can analyze and share

If you want to make this even more hand-off friendly for coworkers, a good next enhancement is a version that prompts for the contact ID and output name interactively (no command-line flags required).