When you’re trying to understand what happened with a customer, the HubSpot timeline is great for reading—but frustrating for extracting. If they are all from your own email address, sure you can check out Gmail or Outlook, but if there are conversations that you are reviewing from another HubSpot user or multiple HubSpot users, it's nearly impossible.
HubSpot Breeze is great, but as of this writing, I have found this method to be a more effective means of analysis and review for many use cases. Similarly, HubSpot's ChatGPT connector is great, but it just has access to metadata not the fully body of emails, so it falls short for this purpose.
Maybe you want to:
- find something specific among a large body of emails
- hand a clean comms history to leadership
- do a post-mortem on a deal or onboarding
- transfer context between team members
- summarize the thread using AI tools
- preserve a record for compliance or documentation
The problem: HubSpot timelines are dynamically loaded, so "save as" and "print to PDF" would produce just a fraction of the email content, highlighting and copying in notepad, often produces “Loading Timeline Activity…” instead of the actual emails, extensions like GoFullPage can't pick it up either.
This guide shows a repeatable, reliable way to export a contact’s logged 1:1 emails into a single chronological Markdown file using a HubSpot Private App token and a small Python script. With this file, you can drop it into your favorite LLM, and get great answers.
What you’ll end up with
- A single file like
contact_emails.md - Emails in chronological order
- Each entry includes timestamp, subject, and email body
- Easy to paste into Notepad, share internally, or feed into analysis tools
What you’ll need
Access and permissions
- You must have access to the HubSpot portal
- You must be allowed to create a Private App, or have someone do it for you
Tools on your computer
- Windows (steps below assume Windows, but this is possible in other Operating Systems)
- Python installed (3.x)
- Command Prompt (built-in)
- Internet access from your machine (to call HubSpot APIs)
Step 1: Create a HubSpot Private App (HubSpot web app)
- Log into HubSpot
- Go to:
- Settings (gear icon)
- Integrations
- Private Apps (Legacy Apps)
- Click Create private app
- Give it a name like:
Email Export Script - Go to the Scopes section
What scopes to enable
HubSpot permissions vary by portal, subscription, and how the email was logged. To avoid guessing “the one perfect scope,” the practical approach is:
- Start with the minimum expected read scopes
- If you get a “missing scopes” error, add the next scope and retry
Enable the following read scopes (names can vary by account):
Start here (most common)
crm.objects.contacts.read(read the contact + associations)
Commonly required for 1:1 sales email content
sales-email-read(often required to read 1:1 email engagement content)
If your portal exposes CRM Email object scopes
crm.objects.emails.read
If you later decide to export from Deals/Companies too
crm.objects.deals.readcrm.objects.companies.read
Optional (only if you want attachments referenced by emails)
- File read scope(s) (exact naming varies by HubSpot account)
Your goal is read-only access. Enable the fewest scopes needed, test, and expand only as required.
- Click Create app
- Copy the Access token (it starts with
pat-)
Keep this token private. Treat it like a password.
Step 2: Create a working folder (Windows File Explorer)
- Create a folder anywhere, for example:
C:\HubSpot Email Export\ - You’ll put one Python file in this folder:
export_hubspot_contact_emails_to_md.py
Step 3: Create the Python script file (Notepad or VS Code)
- Open Notepad (or VS Code)
- Paste the full script below
- Save the file as:
export_hubspot_contact_emails_to_md.py
Important: make sure Notepad doesn’t append .txt. In the “Save as type” dropdown, choose All Files, and name it exactly with .py.
The script (copy/paste as-is)
#!/usr/bin/env python3
"""
Export all HubSpot CRM "Email" objects associated to a Contact into a single
chronological Markdown file.
This uses:
- CRM v4 associations: Contact -> Emails
- CRM v3 emails batch read: to fetch subject/body/timestamps
Usage (Windows CMD):
set HUBSPOT_TOKEN=pat-xxxxx
python export_hubspot_contact_emails_to_md.py --contact-id 123 --out contact_emails.md
"""
from __future__ import annotations
import argparse
import os
import re
import sys
import time
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional
import requests
try:
from bs4 import BeautifulSoup # pip install beautifulsoup4
except Exception:
BeautifulSoup = None
BASE_URL = "https://api.hubapi.com"
DEFAULT_TIMEOUT = 30
MAX_RETRIES = 7
@dataclass
class EmailRecord:
email_id: str
ts_ms: int
ts_iso: str
subject: str
from_: str
to: str
cc: str
direction: str
status: str
body: str
def _get_env_token() -> str:
token = os.environ.get("HUBSPOT_TOKEN", "").strip()
if not token:
raise RuntimeError("HUBSPOT_TOKEN environment variable is not set.")
return token
def hubspot_request(method: str, url: str, token: str, **kwargs) -> requests.Response:
"""Request wrapper with retry/backoff for rate limits and transient failures."""
headers = kwargs.pop("headers", {}) or {}
headers["Authorization"] = f"Bearer {token}"
headers.setdefault("Content-Type", "application/json")
kwargs["headers"] = headers
kwargs.setdefault("timeout", DEFAULT_TIMEOUT)
last_resp: Optional[requests.Response] = None
for attempt in range(MAX_RETRIES):
resp = requests.request(method, url, **kwargs)
last_resp = resp
if resp.status_code in (429, 500, 502, 503, 504):
retry_after = resp.headers.get("Retry-After")
if retry_after and retry_after.isdigit():
sleep_s = int(retry_after)
else:
sleep_s = min(2 ** attempt, 20)
time.sleep(sleep_s)
continue
return resp
return last_resp # type: ignore[return-value]
def normalize_text(s: str) -> str:
s = (s or "").replace("\r\n", "\n").replace("\r", "\n")
s = re.sub(r"[ \t]+", " ", s)
s = re.sub(r"\n{3,}", "\n\n", s)
return s.strip()
def html_to_text(html: str) -> str:
html = html or ""
if not html.strip():
return ""
if BeautifulSoup is not None:
soup = BeautifulSoup(html, "html.parser")
for tag in soup(["script", "style"]):
tag.decompose()
text = soup.get_text("\n")
return normalize_text(text)
# Fallback if bs4 isn't installed
text = re.sub(r"(?i)<br\s*/?>", "\n", html)
text = re.sub(r"(?i)</p\s*>", "\n\n", text)
text = re.sub(r"<[^>]+>", "", text)
return normalize_text(text)
def pick_timestamp_ms(props: Dict[str, Any]) -> int:
"""Prefer hs_timestamp, fall back to created/modified date variants."""
for key in ("hs_timestamp", "hs_createdate", "createdate", "hs_lastmodifieddate"):
v = props.get(key)
if v is None:
continue
try:
return int(v)
except Exception:
pass
return 0
def ms_to_iso(ms: int) -> str:
if not ms:
return "Unknown time"
dt = datetime.fromtimestamp(ms / 1000, tz=timezone.utc)
return dt.isoformat()
def get_associated_email_ids(contact_id: str, token: str) -> List[str]:
"""List Email object IDs associated to a Contact via CRM v4 associations."""
email_ids: List[str] = []
after: Optional[str] = None
while True:
url = f"{BASE_URL}/crm/v4/objects/contacts/{contact_id}/associations/emails"
params: Dict[str, Any] = {"limit": 500}
if after:
params["after"] = after
resp = hubspot_request("GET", url, token, params=params)
if resp.status_code != 200:
raise RuntimeError(f"Associations GET failed ({resp.status_code}): {resp.text}")
data = resp.json()
for r in data.get("results", []) or []:
to_id = r.get("toObjectId")
if to_id is not None:
email_ids.append(str(to_id))
after = (data.get("paging") or {}).get("next", {}).get("after")
if not after:
break
# Dedupe while preserving order
seen = set()
deduped: List[str] = []
for eid in email_ids:
if eid not in seen:
seen.add(eid)
deduped.append(eid)
return deduped
def batch_read_emails(email_ids: List[str], token: str) -> List[Dict[str, Any]]:
"""Batch read Email objects via CRM v3."""
if not email_ids:
return []
props = [
"hs_timestamp",
"hs_createdate",
"hs_lastmodifieddate",
"hs_email_subject",
"hs_email_text",
"hs_email_html",
"hs_email_from",
"hs_email_to",
"hs_email_cc",
"hs_email_bcc",
"hs_email_direction",
"hs_email_status",
]
out: List[Dict[str, Any]] = []
chunk_size = 100
for i in range(0, len(email_ids), chunk_size):
chunk = email_ids[i : i + chunk_size]
url = f"{BASE_URL}/crm/v3/objects/emails/batch/read"
payload = {
"properties": props,
"inputs": [{"id": eid} for eid in chunk],
}
resp = hubspot_request("POST", url, token, json=payload)
if resp.status_code != 200:
raise RuntimeError(f"Batch read failed ({resp.status_code}): {resp.text}")
data = resp.json()
out.extend(data.get("results", []) or [])
return out
def to_email_record(email_obj: Dict[str, Any]) -> EmailRecord:
email_id = str(email_obj.get("id", ""))
props = email_obj.get("properties") or {}
ts_ms = pick_timestamp_ms(props)
ts_iso = ms_to_iso(ts_ms)
subject = normalize_text(props.get("hs_email_subject") or "(no subject)")
from_ = normalize_text(props.get("hs_email_from") or "")
to = normalize_text(props.get("hs_email_to") or "")
cc = normalize_text(props.get("hs_email_cc") or "")
direction = normalize_text(props.get("hs_email_direction") or "")
status = normalize_text(props.get("hs_email_status") or "")
text_body = props.get("hs_email_text") or ""
html_body = props.get("hs_email_html") or ""
body = normalize_text(text_body) if str(text_body).strip() else html_to_text(str(html_body))
if not body:
body = "(no body captured)"
return EmailRecord(
email_id=email_id,
ts_ms=ts_ms,
ts_iso=ts_iso,
subject=subject,
from_=from_,
to=to,
cc=cc,
direction=direction,
status=status,
body=body,
)
def write_markdown(contact_id: str, records: List[EmailRecord], out_path: str) -> None:
now_utc = datetime.now(timezone.utc).isoformat()
lines: List[str] = []
lines.append(f"# HubSpot emails for contact {contact_id}")
lines.append("")
lines.append(f"- Exported at (UTC): {now_utc}")
lines.append(f"- Total emails: {len(records)}")
lines.append("")
for r in records:
lines.append(f"## {r.ts_iso} — {r.subject}")
if r.direction:
lines.append(f"- Direction: {r.direction}")
if r.status:
lines.append(f"- Status: {r.status}")
if r.from_:
lines.append(f"- From: {r.from_}")
if r.to:
lines.append(f"- To: {r.to}")
if r.cc:
lines.append(f"- CC: {r.cc}")
lines.append(f"- Email object ID: {r.email_id}")
lines.append("")
lines.append(r.body)
lines.append("")
content = "\n".join(lines).rstrip() + "\n"
with open(out_path, "w", encoding="utf-8") as f:
f.write(content)
def main() -> int:
parser = argparse.ArgumentParser(description="Export HubSpot contact-associated Email objects to Markdown.")
parser.add_argument("--contact-id", required=True, help="HubSpot Contact record ID / hs_object_id")
parser.add_argument("--out", required=True, help="Output markdown path")
args = parser.parse_args()
token = _get_env_token()
contact_id = str(args.contact_id).strip()
out_path = args.out
email_ids = get_associated_email_ids(contact_id, token)
if not email_ids:
write_markdown(contact_id, [], out_path)
print(f"Wrote {out_path} (0 emails found via Email object associations)")
print("If the HubSpot UI shows emails, they may be stored as timeline events rather than Email objects.")
return 0
email_objs = batch_read_emails(email_ids, token)
records = [to_email_record(o) for o in email_objs]
records.sort(key=lambda r: r.ts_ms)
write_markdown(contact_id, records, out_path)
print(f"Wrote {out_path} ({len(records)} emails)")
return 0
if __name__ == "__main__":
raise SystemExit(main())
Step 4: Install Python dependencies (Command Prompt)
This only needs to be done once.
- Open Command Prompt
- Navigate to your folder (example):
cd "C:\HubSpot Email Export"
- Install dependencies:
pip install requests beautifulsoup4
Step 5: Store your HubSpot token as an environment variable (Command Prompt)
In the same Command Prompt window:
set HUBSPOT_TOKEN=pat-PASTE-YOUR-TOKEN-HERE
Notes:
- Do not include quotes
- Do not include the word
Bearer - This token will exist only for this Command Prompt session
Step 6: Find the Contact ID in HubSpot (HubSpot web app)
You need the contact’s record ID (also called the object ID / hs_object_id).
Common ways to find it:
- Look at the contact record URL (the numeric ID is usually visible)
- Create a contact view that includes the Record ID column
Step 7: Run the export (Command Prompt)
python export_hubspot_contact_emails_to_md.py --contact-id 118519123288 --out contact_emails.md
After it finishes, you’ll have contact_emails.md in the same folder.
Step 8: Open the file
notepad contact_emails.md
Or open it in VS Code for easier navigation/search.
Troubleshooting
“Missing scopes”
If you see an error indicating missing scopes:
- Go back to Private Apps
- Edit your app’s scopes
- Add read scopes incrementally and retry
Suggested order:
crm.objects.contacts.readsales-email-readcrm.objects.emails.read(if available in your portal)
“0 emails found” but HubSpot shows emails on the timeline
In some portals, timeline email activities are stored as timeline events rather than CRM Email objects associated to the contact. If that happens:
- the script will still create a Markdown file
- it will contain 0 emails
- you’ll need an alternate export approach that targets timeline events
Token safety
- Do not paste tokens into screenshots, Slack, or shared docs
- If a token is exposed, revoke it and create a new one
Why this is better than copying from the timeline
- The HubSpot timeline UI is virtualized and lazy-loaded
- “Select all + copy” often captures placeholders instead of content
- The API export gives you a consistent, complete artifact you can analyze and share
If you want to make this even more hand-off friendly for coworkers, a good next enhancement is a version that prompts for the contact ID and output name interactively (no command-line flags required).
