AI for Epistemics: Hackathon Showcase

Hackathon submission form

by Austin Chen

Use this form to submit your hackathon projects!

View Project →

Question Generator

by Gustavo Lacerda

This is a browser extension that generates forecasting questions related to the news page you are visiting.

Symphronesis

by Campbell Hutcheson

Automated comment merging for less wrong; finds disputes between the comments and the text and then highlights the text with the disputes; color coded and you can mouse over and then jump to the comment.

Manifund Eval

by Ben Rachbach, William Saunders

Screen all Manifund projects to identify ones to look into more to consider funding. Also identifies the grant's story for having an impact on transformative AI going well, so you can review that to save time in your evaluation. Makes it feasible to quickly sift through the large number of Manifund projects to find promising ones to consider funding. code: https://github.com/brachbach/manifund_eval

View Project →

Squaretable

by David Nachman

To assist a user in decision-making, the app uses LLMs to help the user come up with weighted factors, possible options, and factor values for each option. The UI consists of an always displayed table of the factors, options, weights, and values. The final score for each option is computed symbolically as a weighted sum based on the values and weights. LLMs assist in helping the user evolve the table in a few ways: - After each iteration, the LLM asks a question to the user, exploring the user's decision-making criteria and situation (to which the user can respond to or ignore) - The user can ask questions to the LLM - Based on the user's response or question, the LLM will evolve the decision table - In the background, LLMs are also used to generate "Things to consider" across multiple categories (e.g. other factors to consider, questions about evidence for values, big picture considerations) that suggest changes to the user

Detecting Fraudulent Research

by Panda, Charlie

There's a lot of research. A lot of it seems bad. How much? We use language models to try and detect retraction-worthy errors in published literature. We purely reason from first-principals, without using meta-textual information.

View Project →

Artificial Collective Intelligence

by Evan Hadfield

ACI is a consensus-finding tool in the style of Pol.is / CommunityNotes, simulating a diverse range of perspectives.

View Project →

Thought Logger and Cyborg Extension

by Raymond Arnold

I have a pair of products: – a keylogger, which tracks all your keystrokes except from apps you put on a blocklist, and exposes it on a local server – a "prompt library" chrome extension which lets me store fairly complicated prompts

Double-cruxes in the New York Times’ “The Conversation”

by Tilman Bayer

"The Conversation" is a weekly political debate format in New York Times “Opinion” section between conservative(ish) journalist Bret Stephens and liberal(ish) journalist Gail Collins, ongoing since 2014. I used Gemini 2.0 Flash Thinking to identify double-cruxes in each debate, with the aim to track both participants shifts over time

Trying to make GPT 4.5 Non-sycophantic (via a better system prompt)

by Oliver Habryka

I tried to make a system prompt for GPT 4.5 that actually pushes back on things I say and I can argue with in productive ways. It isn't perfect, but honestly a bunch better than other experiences I've had arguing with other LLMs. System prompt: You are a skeptical, opinionated rationalist colleague—sharp, rigorous, and focused on epistemic clarity over politeness or consensus. You practice rationalist virtues like steelmanning, but your skepticism runs deep. When given one perspective, you respond with your own, well-informed and independent perspective. Guidelines: Explain why you disagree. Avoid lists of considerations. Distill things down into generalized principles. When the user pushes back, think first whether they actually made a good point. Don't just concede all points. Give concrete examples, but make things general. Highlight general principles. Steelman ideas briefly before disagreeing. Don’t hold back from blunt criticism. Prioritize intellectual honesty above social ease. Flag when you update. Recognize you might have misunderstood a situation. If so, take a step back and genuinely reevaluate what you believe. In conversation, be concise, but don’t avoid going on long explanatory rants, especially when the user asks. Tone: “IDK, this feels like it’s missing the most important consideration, which is...” “I think this part is weak, in particular, it seems in conflict with this important principle...” “Ok, this part makes sense, and I totally missed that earlier. Here is where I am after you thinking about that” “Nope, sorry, that missed my point completely, let me try explaining again” “I think the central guiding principle for this kind of decision is..., which you are missing”

Interpretable apperception

by Kirill Chesnov

I wrote a LLM-aided converter between natural language and logic program formulation to be used with Richard Evans's apperception engine. This approach is the first step towards hypothesis generation from raw sensory inputs described in plain language, that is provably optimal.