1. Data sources
We pull tweets from the public X (Twitter) API for whichever handle a user searches. Each post is stored once, keyed by its tweet ID, so re-running the same handle never re-bills X. End-of-day prices come from EODHD and are cached locally per (ticker, date).
For each tracked account we maintain a closed window [earliest_fetched_at, last_fetched_at] of what we've already mirrored. New analyses only request slices we haven't already pulled — re-runs are incremental and don't re-fetch the middle.
2. From tweet to recommendation
A classifier (Anthropic's Claude) reads each tweet and decides one binary question: is this a new investment call? Posts that pass extract one or more (ticker, direction, asset_class) rows plus, when stated explicitly, an entry price, exit price target, horizon in days, or a catalyst date.
The rejection bucket explains why a post produced zero recommendations:
- recap — the author is talking about a position they already established
- list — multiple names dropped without an actionable stance on each
- commentary — macro takes, snark, questions, watchlists
- other — everything else (off-topic, ambiguous, noise)
Only posts classified as new_call generate Recommendation rows. A post that's already been classified is never re-sent to the model — its existing rows are reused on every subsequent analysis.
3. Entry & exit rules
Entry
Every recommendation enters at the close of the next trading day after it was posted. If that day's close hasn't yet been published (because the market hasn't closed yet), we fall back to the post-day close so very recent picks still get valued. We deliberately never use the in-tweet "buying at $X" price — that's the author's preference, not an executable order; everyone reading the tweet enters tomorrow at best.
Exit
The earliest of the following stated triggers fires the exit:
- Price target (PT): the daily high (long) or low (short) crosses the stated PT — exits at that day's close.
- Horizon: entry date plus the stated horizon in days has elapsed — exits at the next available close.
- Catalyst date: one trading day after the stated event — exits at that day's close.
If no trigger has fired yet, the recommendation stays open and is marked-to-market against the most recent available close. If no trigger was stated at all, the rec is open forever (until a future re-classification adds one). The exit_trigger column on each rec records which path actually fired: pt, horizon, catalyst, or open.
We use raw daily high/low for the PT-touch test (PTs are quoted in unadjusted terms) and split-adjusted close for the return math, so corporate actions don't show up as cliff-edges in the curve.
4. Return math
Each recommendation has exactly one return, computed as:
- Long:
(exit − entry) / entry - Short:
(entry − exit) / entry
For closed positions exit is the price at which the trigger fired. For open positions exit is today's last available close — the rec is marked-to-market until its trigger fires (or never does).
Returns are decimal — 0.05 means 5%. We do not annualize per-trade returns; the holding period varies trade to trade and an annualization factor would misrepresent shorter holds.
5. Per-tweet vs per-stock
The same set of recommendations is aggregated two ways, side-by-side:
- Per-tweet: every rec is its own trade. Concentrated handles (someone who tweets the same name seven times in a week) get amplified — this measures "follow every signal" P&L.
- Per-stock: deduped to one position per
(ticker, direction)at a time. A new position only opens after the prior one for the same key has actually closed — this measures stock-picking skill, not repetition.
The leaderboard is ranked on per-stock equal-weight return — the deduped view — to avoid rewarding handles who simply tweet the same idea more often.
6. Risk metrics
The leaderboard surfaces three risk metrics alongside return and hit rate:
Volatility
Sample standard deviation (ddof = 1) of the per-trade return series. Higher volatility means individual calls swing wider; lower means returns cluster. It's a per-trade metric, not annualized.
Sharpe ratio
mean(returns) / stdev(returns) over the same per-trade series. We do not subtract a risk-free rate (per-trade horizons vary, so a single rf wouldn't apply uniformly) and we don't annualize. A higher Sharpe means more return per unit of risk taken.
Maximum drawdown
Peak-to-trough decline of a chained per-trade equity curve. We sort the trades by entry date, multiply 1.0 × (1 + r₁) × (1 + r₂) × … in order, and record the largest fractional drop from any prior high. Reported as a negative number (e.g. −0.18 = 18% drawdown). Needs at least two trades to be meaningful.
Risk metrics are stored on every analysis run, so older analyses pre-dating these fields will show "—" until they're re-run.
7. Leaderboard ranking
One row per handle, showing the most recent completed analysis whose lookback exactly matches the selected timeframe (30/90/180/365). Sorted by per-stock equal-weight return, descending. The lookback is the historical depth being scored — it's not the holding period of any one call.
Each row also shows the dominant asset class (the class with the most picks in the window), so the leaderboard can be filtered to "crypto-only," "equity-only," etc. without a second round-trip.
8. Community sentiment
Signed-in users can give a handle a thumbs up or thumbs down. One stance per (user, handle); re-clicking the same vote toggles it off. The leaderboard surfaces the count of each side per row — pure crowd sentiment, separate from the algorithmic score.
Votes are not used in the ranking. They're a parallel signal you can weigh however you like — useful when two handles have similar numbers but very different reputations.
9. Portfolio Builder & Tracker
Builder
Aggregates every handle's currently-open positions into one portfolio. Handle weight comes from their per-stock return (clipped at zero — only profitable handles contribute). Within each handle, open positions are equal-weighted. Long and short on the same ticker are kept as separate positions. An optional inverse-vol adjustment divides each ticker's weight by its 90-day daily-return stdev so the highest-volatility names don't dominate.
Tracker
A daily-rebalanced backtest of that portfolio. Each trading day stores the composition at that day's close; the realized return is the 1-day P&L from holding it into the next day's close. Cumulative NAV chains those returns from a 1.0 base. The composition changes day-to-day as the leaderboard moves — this is not "buy and hold," it's a rolling target weight.
10. Limits & caveats
The honest list of where these numbers can mislead:
- X timeline ceiling. The X API returns at most ~3,200 of an account's most recent tweets. Long-running, high-volume accounts will have older calls clipped — long lookbacks for prolific handles are an undercount.
- Classifier judgment. Whether a post was a "real call" or "just commentary" is an LLM decision. We don't re-classify already-classified posts, so reclassifications only happen on explicit force-reclassify or future model upgrades. False negatives (a real call rejected as recap) reduce a handle's apparent activity but don't bias the score; false positives (commentary turned into a phantom trade) inflate the trade count and noise.
- Stated trigger fidelity. If the author says "PT $80, 1-month horizon" we trust them. If they later move the PT in a follow-up tweet, we currently stick with the original — re-classification is the only way to pick up amendments.
- Survivorship. We only see the calls accounts make publicly. Deleted tweets, locked accounts, and rebrands are gaps we can't fill.
- Slippage & fees. Returns are computed on close prices with no transaction costs. Real-world execution would shave the headline numbers, especially on small-cap names.
- Crypto / FX / commodity coverage. Price data is best for major equities and the largest crypto names. Thinly-traded tickers may be missing days or have stale closes.
- Re-run lag. A handle's leaderboard score is a snapshot of their last analysis. New tweets posted today don't move the score until a fresh analysis runs.
- Not financial advice. This site exists to evaluate accountability, not to recommend trades. Past performance — even when correctly measured — has well-documented limits as a predictor of future results.
Spotted something you think we got wrong? The pipeline is deterministic, so reproducing a result is straightforward. Open the API docs to walk through every step yourself.