Skip to main content
  1. posts/

Building a Deep Hacker News Research Tool: Phase 1

·14 mins

A conversation with Claude about extracting value from HN discussions.


Me: I’ve been thinking about building a tool to extract more value from Hacker News discussions. There’s so much knowledge buried in those threads, but it’s hard to process it all manually. What do you think?

Claude: That’s an interesting challenge! HN discussions often contain valuable insights, expert opinions, and real-world experiences. How would you like to approach this?

Me: Let’s start simple - maybe analyze the sentiment of comments to understand the overall tone of discussions?

Claude: That’s a good starting point. Let me show you what I’ve helped build so far.

Phase 1: The Sentiment Analyzer #

We created a Python-based tool that:

  1. Fetches HN threads using their API
  2. Builds a comment tree structure
  3. Analyzes sentiment of each comment
  4. Groups comments by sentiment (positive, negative, neutral)

Here’s what the basic structure looks like:

class Comment:
    """Represents a single HN comment with metadata."""
    id: int
    text: str
    author: str
    timestamp: datetime
    parent_id: Optional[int]
    children: List['Comment']

We use an async client to efficiently fetch comments:

async def fetch_comment_tree(client: HNClient, comment_id: int):
    """Recursively fetch a comment and its replies."""
    comment = await client.get_item(comment_id)
    # Recursively fetch child comments
    for kid_id in comment.get('kids', []):
        child = await fetch_comment_tree(client, kid_id)
        comment['child_comments'].append(child)
    return comment
venv➜  deep-hn git:(main) python analyze.py 43133207
⠼ Fetching comments tree...

The obvious learning here is that recursively scanning tree is very slow, but it’s a good starting point. The output is somewhat useful:

╭─────── Thread Info ───────╮
│ The Deep Research problem │
│ By: cratermoon            │
│ Score: 238                │
│ Comments: 28              │
╰───────────────────────────╯

═══ Full Comment Tree ═══
Comment Tree
├── tptacek (-0.00): I did a trial run with Deep Research this weekend to do a comparative analysis of the comp packages ...
│   ├── WhitneyLand (0.12): >>The premise of this post seems to be that material errors in Deep Research results negate th...
│   └── aprilthird2021 (0.19): I'm just realizing this might finally be something that helps me get past analysis paralysis I ...
│       └── xmprt (-0.10): On the flipside, you might end up getting scammed even worse because of incorrect analysis. For exam...
│           └── infecto (0.01): While this will undoubtedly happen, I don't understand why this is a new phenomenon, the intern...
│               └── nxobject (0.25): I think the difference with Deep Research – and other hallucination and extrapolation-prone research...
├── tippytippytango (-0.07): I think of it as the return to 10 blue links. It searches the web, finds stuff and summarizes it so ...
│   ├── CreepGin (0.12): Agreed. Maybe we're moving toward a world where LLMs do all the searching, and "websites&q...
│   │   ├── ben_w (0.16): Interesting times, for sure.<p>&gt; and &quot;websites&quot; just turn into data-only endpoints made...
│   │   ├── svara (0.15): Interesting idea. AI can&#x27;t look at ads, so in the long run ads on informational material might ...
│   │   │   ├── mhuffman (0.00): &gt;AI can&#x27;t look at ads<p>But ads can be put in AI.
│   │   │   │   └── dingnuts (0.18): Ads? Too obvious. Just always suggest sponsors&#x27; products when it&#x27;s in context, and censor ...
│   │   │   └── adav (0.20): The ads will just become either bulk misinformation or carefully worded data points that nudge the A...
│   │   └── Seattle3503 (0.00): What do the websites get out of that exchange?
│   │       └── fragmede (0.00): therein lies the problem, and is why Google search didn&#x27;t disrupt itself until ChatGPT came aro...
│   └── dingnuts (-0.07): If you ignore the narrative and only look at the links then you&#x27;re just describing a search eng...
├── ano-ther (0.00): Similar to what Derek Lowe found with a pharma example: <a href="https:&#x2F;&#x2F;www.science.org&#...
│   └── stogot (0.00): Of which, people will surely die (when it is used to publish medical research by those wishing not t...
│       └── genewitch (0.03): I think the &quot;delve&quot; curve shows were already well into &quot;AI papers&quot; stage of civi...
├── lsy (0.20): Research skills involve not just combining multiple pieces of data, but also being able to apply ver...
├── FeepingCreature (0.16): &gt; Are you telling me that today’s model gets this table 85% right and the next version will get i...
│   ├── semi-extrinsic (0.03): If I&#x27;m paying a human, even a student working part-time or something, I expect &quot;concrete f...
│   │   ├── nuancebydefault (0.17): You can expect that from a human, but if you don&#x27;t know their reputation, you&#x27;d be lucky w...
│   │   └── FeepingCreature (0.50): Sure but there&#x27;s no step change.
│   └── smusamashah (0.15): A human WILL NOT make up non-existent facts, URLs, libraries and all other things.  Unless they deli...
│       └── jsjohnst (0.23): &gt; A human WILL NOT make up non-existent facts<p>Categorically not true and there’s so many exampl...
│           └── dingnuts (0.17): It&#x27;s absolutely true. Humans misremember details but I&#x27;ll ask an LLM what function I use t...
├── submeta (0.25): Deep Research is in its „ChatGPT 2.0“ phase. It will improve, dramatically. And to the naysayers: Wh...
│   ├── amelius (0.25): This is like saying: y=e^-x+1 will soon be 0, because look at how fast it went through y=2!
│   │   ├── submeta (0.04): Many past technologies have defied “it’s flattening out” predictions. Look at Personal computing, th...
│   │   │   ├── j_maffe (0.25): &gt; Many past technologies have defied “it’s flattening out” predictions.<p>And many haven&#x27;t
│   │   │   └── dingnuts (-0.18): everything you listed was subject to the effects of Moore&#x27;s Law, explaining their trajectories,...
│   │   ├── PeterFBell (0.35): Thanks for making my day :)
│   │   ├── kridsdale3 (0.00): I appreciate your style of humor.
│   │   └── whyenot (0.04): Tony Tromba (my math advisor at UCSC) used to tell a low key infuriating, sexist and inappropriate s...
│   │       ├── Y_Y (0.07): (from a sibling&#x27;s link)<p>&gt; A mathematician and a physicist agree to a psychological experim...
│   │       ├── rpmisms (0.00): This sounds like a joke with a lot of truth, even if it is offensive.
│   │       ├── john_minsk (0.00): Can I have a joke?
│   │       └── lynx97 (-0.20): 
│   ├── hiq (0.16): &gt; Now after two years look at Cursor, aider, and all the llms powering them, what you can do with...
│   ├── nicksrose7224 (-0.33): disagree - i actually think all the problems the author lays out about Deep Research apply just as w...
│   │   └── simonw (0.11): I think Deep Research shows that these things can be very good at precision and recall of informatio...
│   │       └── benedictevans (-0.03): Deep Research doesn’t give the numbers that are in statcounter and statista. It’s choosing the wrong...
│   │           └── simonw (0.11): Wow, that&#x27;s really surprising. My experience with much simpler RAG workflows is that once you s...
│   │               └── benedictevans (-0.17): Have a look at the previous essay. I couldn&#x27;t get ChatGPT 4o to give me a number in a PDF corre...
│   │                   └── simonw (0.07): I have a hunch that&#x27;s a problem unique to the way ChatGPT web edition handles PDFs.<p>Claude ge...
│   │                       └── benedictevans (0.07): Interesting, thanks.  I think the higher level problem is that 1: I have no way to know this failure...
│   │                           └── simonw (0.03): Yeah, completely understand that. I talked about this problem on stage as an illustration of how inf...
│   ├── dchichkov (0.07): I agree, they are only starting the data flywheel there.  And at the same time making users pay $200...
│   │   ├── nuancebydefault (0.50): The interns of today are tomorrow&#x27;s skilled scientists.
│   │   └── moduspol (0.00): Just FYI: They did roll out Deep Research to those of us on the $20&#x2F;mo tier at (I think) about ...
│   └── fragmede (-0.02): Unfortunately that&#x27;s not how trust works. If someone comes into your life and steals $1,000, an...
├── rollinDyno (0.27): Everyone who has been working on RAG is aware of how important source control is. Simply directing y...
│   └── jslakro (-0.30): Human-in-the-loop (HITL) a buzzword that has become common these days
├── simonw (0.20): When ChatGPT came out, one of the things we learned is that human society generally assumes a strong...
│   ├── isaiahwp (0.35): To be fair, OpenAI&#x27;s the one marketing it as such.
│   ├── immibis (0.00): That&#x27;s been every LLM since GPT-2.
│   └── SubiculumCode (0.40): In some ways, it&#x27;s a good tool to teach yourself to sus out the real clues to reliability, not ...
│       └── j_maffe (0.00): But that&#x27;s the thing. The only way to truly find out if it&#x27;s reliable (&gt;90%) is to chec...
├── baxtr (-0.06): I urge anyone to do the following: take a subject you know really really well and then feed it into ...
│   ├── caseyy (0.00): In my experience, Perplexity and OpenAI&#x27;s deep research tools are so misleading that they are a...
│   │   └── brokencode (-0.01): &gt; “They will be harmed as a result.”<p>Compared to what exactly? The ad-fueled, SEO-optimized nig...
│   │       ├── simianparrot (0.00): It’s deceptive by design because there is no reasoning, and humans created it and know this.
│   │       └── caseyy (0.05): OpenAI knows the tool it markets as “research” does not pass muster. It hallucinates, mid-quotes sou...
│   ├── ilrwbwrkhv (0.25): Yup none of these tools are actually any close to AGI or &quot;research&quot;. They are still a much...
│   ├── jsemrau (0.20): In my case very &quot;not useful&quot;. Background, I am writing a Substack where I write &quot;deep...
│   └── genewitch (0.00): Murrai Gell-Mann amnesia
├── Lws803 (0.04): I always wondered, if deep research has an X% chance of producing errors in it&#x27;s report and you...
│   └── ImaCake (0.17): It might depend on how much you struggle with writers block. An LLM essay with sources is probably a...
├── light_triad (0.25): &quot;Deep research&quot; is super impressive, but so far is more &quot;search the web and surf page...
│   └── j_maffe (0.20): &gt; It is in many ways a workaround to Google&#x27;s SEO poisoning.<p>But the article goes into exa...
├── Alifatisk (0.85): What a beautiful website
├── smusamashah (-0.06): Watched recent Viva la dirt league videos on how trailers lie and do false promises. Now I see LLM a...
├── kgeist (0.00): Deep Research, as it currently stands, is a jack of all trades but a master of none. Could this prob...
├── jppope (0.17): These days I&#x27;m feeling like GenAi is basically an accuracy rate of 95% maybe 96%. Great at boil...
│   ├── daxfohl (-0.03): I think it&#x27;s not the valuable stuff though. The valuable stuff is all the boilerplate, because,...
│   └── bakari500 (-0.36): Yeah but you have 4 to 6 % error that’s not good even if you have dumb computer
├── nuancebydefault (0.22): The problem with tools like deep research is that they imply good reasoning skills of the underlying...
├── iandanforth (0.13): I&#x27;ll share my recipe for using these products on the off chance it helps someone.<p>1. Only do ...
│   ├── tkgally (0.00): &gt; ... perform the search in multiple products<p>I do that a lot, too, not only for research but f...
│   └── munchler (0.50): This makes sense. How many of those products do you have to pay for?
│       └── kridsdale3 (0.28): I&#x27;m not OP but I do similar stuff. I pay for Claude&#x27;s basic tier, OpenAI&#x27;s $200 tier,...
│           ├── visarga (0.04): So you are basically doing a first pass with diverse models and second pass catches contradictions a...
│           └── munchler (0.35): Wow, that&#x27;s eye-opening. So, just to be clear, you&#x27;re paying for Claude and OpenAI out of ...
├── somerandomness (0.16): Indeed, the main drawback of the various Deep Research implementation is the quality of sources is d...
├── zeckalpha (0.10): Two factors to consider: human performance and cost.<p>Plenty of humans regularly make similar mista...
├── visarga (0.39): The thig is, if you look at all the &quot;Deep Research&quot; benchmark scores, they never claim to ...
├── jwpapi (0.12): Yes the confidence tbh is getting a bit out of hand. I see the same thing with coding with our SAAS,...
├── franze (0.16): I am currently in India in a big city doing yoga for the first time as a westerner.<p>I dont Google ...
├── spoaceman7777 (0.29): I, for one, have it in my prompt that GPT should end every message with a message about how sure it ...
├── skywhopper (0.14): This article covers something early on that makes the question of “will models get to zero mistakes”...
├── furyofantares (0.04): I used deep research with o1-pro to try to fact&#x2F;sanity check a current events thing a friend wa...
└── theGnuMe (0.02): One other existential question is Simpson&#x27;s paradox, which I believe is exploited by politician...
    └── ImaCake (0.10): I&#x27;ve never thought of Simpson&#x27;s Paradox as a political problem before, thanks for sharing ...
        └── theGnuMe (0.00): What examples do you have for Bayes vs freq; and molecular vs biochem?

═══ Positive Comments (45) ═══
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Alifatisk (0.85)                                                                                                                                                │
│ What a beautiful website...                                                                                                                                     │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ FeepingCreature (0.50)                                                                                                                                          │
│ Sure but there&#x27;s no step change....                                                                                                                        │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ nuancebydefault (0.50)                                                                                                                                          │
│ The interns of today are tomorrow&#x27;s skilled scientists....                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ munchler (0.50)                                                                                                                                                 │
│ This makes sense. How many of those products do you have to pay for?...                                                                                         │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ SubiculumCode (0.40)                                                                                                                                            │
│ In some ways, it&#x27;s a good tool to teach yourself to sus out the real clues to reliability, not format and authoritative tone....                           │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

═══ Negative Comments (7) ═══
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ bakari500 (-0.36)                                                                                                                                               │
│ Yeah but you have 4 to 6 % error that’s not good even if you have dumb computer...                                                                              │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ nicksrose7224 (-0.33)                                                                                                                                           │
│ disagree - i actually think all the problems the author lays out about Deep Research apply just as well to GPT4o &#x2F; o3-mini-whatever. These things just are │
│ absolutely terrible at precision &amp; r...                                                                                                                     │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ jslakro (-0.30)                                                                                                                                                 │
│ Human-in-the-loop (HITL) a buzzword that has become common these days...                                                                                        │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ lynx97 (-0.20)                                                                                                                                                  │
│ ...                                                                                                                                                             │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ dingnuts (-0.18)                                                                                                                                                │
│ everything you listed was subject to the effects of Moore&#x27;s Law, explaining their trajectories, but Moore&#x27;s Law doesn&#x27;t apply AI in any way. And │
│ it&#x27;s dead....                                                                                                                                              │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

═══ Neutral Comments (43) ═══
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ zeckalpha (0.10)                                                                                                                                                │
│ Two factors to consider: human performance and cost.<p>Plenty of humans regularly make similar mistakes to the one in the Deep Research marketing, with more    │
│ overhead than an LLM....                                                                                                                                        │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ImaCake (0.10)                                                                                                                                                  │
│ I&#x27;ve never thought of Simpson&#x27;s Paradox as a political problem before, thanks for sharing this!<p>Arguably this applies just as well to Bayesian vs   │
│ Frequentist statisticians or Molecular vs ...                                                                                                                   │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ dingnuts (-0.07)                                                                                                                                                │
│ If you ignore the narrative and only look at the links then you&#x27;re just describing a search engine with an AI summarization feature. You could just use    │
│ Kagi and click &quot;summarize&quot; on the...                                                                                                                  │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ simonw (0.07)                                                                                                                                                   │
│ I have a hunch that&#x27;s a problem unique to the way ChatGPT web edition handles PDFs.<p>Claude gets that question right: <a                                  │
│ href="https:&#x2F;&#x2F;claude.ai&#x2F;share&#x2F;7bafaeab-5c40-434f-b849...                                                                                    │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ dchichkov (0.07)                                                                                                                                                │
│ I agree, they are only starting the data flywheel there.  And at the same time making users pay $200&#x2F;month for it, while the competition is only charging  │
│ $20&#x2F;month.<p>And note, the system is...                                                                                                                    │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The “sentiment analyzer” is just the beginning, and we need to dig deeper to extract the most value from HN.

Learnings from Phase 1 #

  • Sentiment Isn’t Enough: While analyzing sentiment gives us a high-level view of discussions, it misses the nuanced technical insights that make HN valuable.
  • Context Matters: Technical discussions often use neutral language but contain valuable information. A negative sentiment might actually be constructive criticism.
  • Threading is Key: HN’s threaded discussions often show how ideas evolve and get refined through conversation.

The Vision: A Deep Research Tool After building the initial sentiment analyzer, I realized we could do so much more. Here’s what a proper HN research tool could look like:

  1. Knowledge Extraction Identify technical insights and code snippets Extract references (papers, GitHub repos, blogs) Tag topics and domains Track evolving discussions
  2. Expertise Analysis Build expertise profiles for contributors Track domain specializations Measure technical depth of discussions Identify influential comments
  3. Knowledge Graph Connect related discussions across threads Map topic relationships Create citation networks Track concept evolution
  4. Research Capabilities Topic-based navigation Expert directories Historical trend analysis Automated research summaries

Next Steps #

The initial sentiment analyzer was just the beginning. The real value will come from:

  • Building a comprehensive knowledge graph
  • Creating topic tracking systems
  • Implementing expertise identification
  • Developing trend analysis capabilities
I’ve been fortunate to work with and learn from data, product and growth leaders at companies such as InVision, CB Insights, Breather, Karbon, FreshBooks, Wealthsimple, among others. I love getting my hands dirty in helping build infrastructure to power insights and growth. I’ve helped build data teams from scratch, and I’m learning how to a better manager every day.