HN

HackerNews GitHub Archive

17,900+ Curated Repos • 2025

Get Instant Access

HackerNews GitHub Archive: 17,900+ Curated Repos (2025)

Stop wasting time digging through threads. Get every high-signal GitHub repo that hit HackerNews for the last two years — cleaned, verified, and ready to use.

CSV, JSON & Markdown • 17,900+ rows • Verified GitHub links • Commercial use allowed
Repos Per Week Repos Per Month Repos Per Hour

What this is

A ready-to-use dataset: 17,900+ GitHub repositories that were posted to HackerNews between 2025. Each row contains the project title, the direct GitHub link, and the submission date — cleaned and de-duped so you get signal, not noise.

  • No scraping required
    We already did the extraction and cleanup. You get a clean CSV (with JSON, & Markdown) and get to work.
  • Verified links
    Links were filtered and reviewed — dead links and spam removed.
  • Plug & play
    CSV (or JSON, or Markdown) loads into Excel, BigQuery, Pandas, or any analytics tool instantly.

What's included

  • • 17,900+ curated rows (title, GitHub URL, submission date)
  • • Clean CSV, JSON, Markdown (UTF-8)
  • • Zero duplicates, minimal noise
  • • Instant download link after purchase
  • • Commercial use allowed

Who this is for

Developers, indie hackers, founders, content creators, data scientists, and researchers who want a fast path to real-world repo signals — without grinding through threads and scraping failures.

Why this saves you weeks

  1. Time to collect: Manually scraping and verifying 2+ years of HackerNews posts takes days or weeks.
  2. Noise reduction: We removed non-GitHub links, spam, and repeats — you only get projects.
  3. Format headaches: CSV, JOSN and Markdown that loads instantly into your tools — no parsing or schema drama.

Bottom line: do the valuable work — analysis, product design, model training — not the boring extraction and cleanup.

Legitimacy & Credibility

This dataset was compiled using date-based extraction from HackerNews listings, strict filtering for GitHub links, and manual cleanup to remove dead links and spam. No private data, no scraped credentials — only public postings and repo links.

17k
Curated rows
2025
Source window
CSV
Ready to use
JSON
Ready to process
Markdown
Ready to render

FAQ

What format is the dataset?

CSV, JSON, and Markdown (UTF-8). Columns: title, github_url, submission_date.

Will I get updates?

Yes — updates covering corrections for the 2025 window are included.

Can I use it commercially?

Yes. Use it in your projects, products, or research.

How accurate is the data?

Filtered by machine and reviewed by hand to remove obvious junk. Always public data — no private fields.

Ready to skip the scraping drama?

Get the full 17,900+ repo dataset and start building today.