HackerNews GitHub Archive: 17,900+ Curated Repos (2025)
Stop wasting time digging through threads. Get every high-signal GitHub repo that hit HackerNews for the last two years — cleaned, verified, and ready to use.
What this is
A ready-to-use dataset: 17,900+ GitHub repositories that were posted to HackerNews between 2025. Each row contains the project title, the direct GitHub link, and the submission date — cleaned and de-duped so you get signal, not noise.
- • No scraping requiredWe already did the extraction and cleanup. You get a clean CSV (with JSON, & Markdown) and get to work.
- • Verified linksLinks were filtered and reviewed — dead links and spam removed.
- • Plug & playCSV (or JSON, or Markdown) loads into Excel, BigQuery, Pandas, or any analytics tool instantly.
What's included
- • 17,900+ curated rows (title, GitHub URL, submission date)
- • Clean CSV, JSON, Markdown (UTF-8)
- • Zero duplicates, minimal noise
- • Instant download link after purchase
- • Commercial use allowed
Who this is for
Developers, indie hackers, founders, content creators, data scientists, and researchers who want a fast path to real-world repo signals — without grinding through threads and scraping failures.
Why this saves you weeks
- Time to collect: Manually scraping and verifying 2+ years of HackerNews posts takes days or weeks.
- Noise reduction: We removed non-GitHub links, spam, and repeats — you only get projects.
- Format headaches: CSV, JOSN and Markdown that loads instantly into your tools — no parsing or schema drama.
Bottom line: do the valuable work — analysis, product design, model training — not the boring extraction and cleanup.
Legitimacy & Credibility
This dataset was compiled using date-based extraction from HackerNews listings, strict filtering for GitHub links, and manual cleanup to remove dead links and spam. No private data, no scraped credentials — only public postings and repo links.
FAQ
What format is the dataset?
CSV, JSON, and Markdown (UTF-8). Columns: title, github_url, submission_date.
Will I get updates?
Yes — updates covering corrections for the 2025 window are included.
Can I use it commercially?
Yes. Use it in your projects, products, or research.
How accurate is the data?
Filtered by machine and reviewed by hand to remove obvious junk. Always public data — no private fields.
Ready to skip the scraping drama?
Get the full 17,900+ repo dataset and start building today.