cache

A simple tokenmaxxing playbook. Briefly talk about how to increase token throughput (queues, fan-out, more context, loops) to generate high quality work (repetition, verification, evals).

Closing the software loop

  • PR review agents, PR generation agent (review error logs, confirm error, reproducing error
  • Issue triage agents (labeling, adding context to PRs), documentation agents (review code diffs to update knowledge), security/audit agents
  • Test writing agents, refactor/migration, Q&A agents, monitoring/incident/cause grouping agents, dependency audit agents, etc.

Patterns

  • Humans gate where being wrong is expensive
  • Tokenmaxxing = context size x pass count x call count x frequency x io type (pdf, datasheet, diagram, text, etc.)
  • Add context, world expansion, generate more things
  • Multiple passes: spawn independent agent passes on the same problem, verify passes, solve same problem multiple times (fixes confidently wrong outputs, stress test pros and cons)
  • Spawn many agents to create queues to drain them (multiple step research, crawlers, testing many ideas)
  • Synthetic data generation: training examples, test cases, etc.
  • Loops to revise (generate, critique, revise loop)
  • Map-reduce over large inputs
  • Throughput stoppers workarounds (e.g. rate limits, human intervention)

Tooling

  • MCPs (access layer like github, CI, sentry, etc.), Claude Code, Agent SDKs
  • Schedule (cron), Loop (watch)
  • Skills (how to do things; pr-review, pr-generation, etc.); use evals on skills (positive and negative cases) for consistent results
  • skill-creator + eval-viewer to scaffold skills and review results

Guardrails

  • Eval set for each agent/skill (set expectations on correctness)
  • Measure output delta with different agents and skills or without (choose best one)
  • Check output / usage; check token usage and downstream metrics

Disclaimer: all the projects mentioned in this post were created two months ago, maybe a month ago. This post was written around that time frame, and then just slowly edited.

Humans have always distinguished themselves via their tooling. To be much more productive is empowering. But admittedly, some existentialist pangs were felt. This post is me building a couple of apps, some used by me daily, some deployed, some proof of concept, all of which are built with Opus 4.6.


Chansey – Chansey is a RAG application that takes medical questions, translating them into PubMed queries using an LLM. The answers are classified (diagnostic, comparative, guideline, etc.) and then AI-generated using text from relevant citations. I wanted to build an OpenEvidence clone; this is a Temu version with a much simpler RAG, directly querying PubMed, no caching. You can try it out on chansey.bwang.io

Diglett – I wanted to find government contract opportunities on SAM.gov. Diglett scrapes SAM.gov, uses LLM to read attachments and analyze potential procurement opportunities. Generate documents/emails to bid on contracts. Interfacing with government contracts is still laden with human involvement. It’s run locally and uses Google Sheets as back-end for someone to manage these opportunities in the future. Why SAM.gov? Website for Canadian contracts is bad and United Nations / UNGM is sort of behind a paywall. I have set up an LLC and made some submissions, fingers crossed.

Starmie – Fundamental Analysis/screening of global small-cap stocks. After reading posts such as posts from dirt, I have had some success last year picking stocks (see my stale substack). Starmie is my effort to reproduce the process, periodically picking a few small-cap high-conviction stocks. It uses a postgresql db on neon powering Grafana, displaying a few metrics I care about: EV/EBIT, ROIC, NCAV.

Available at https://stonks.bwang.io/ (Deployed version is very slow)

Munchlax – Some restaurants are too hard to book. Restaurant sniper that polls popular restaurants on Resy, instantly booking when the restaurant becomes available. React Frontend. Go (go-chi) + PostgreSQL backend, storing profiles, watches (snipe job), bookings (successful reservations), watch events (event log for status changes). I have used this to successfully book a restaurant in Chicago and I might use it again in the future.

Noctowl – I need more exposure to the Japanese language. Noctowl is a language learning chrome extension by automatically replacing English words on webpages with the target translation. It features anki-esque flashcards, click-to-pronounce, grammatical form matching, and cloud sync via Firebase/Google sign-on.

Omastar – A March Madness prediction that uses a 7 feature model trained with XGBoost on historical games to beat “picking the higher seed”. The front-end features a bracket that tracks how the model is currently doing. My bracket is currently under-performing.

Available at https://www.bwang.io/omastar/index.html

Farfetch’d – A personal job search tool that scrapes job postings from many boards. Gemini scores each match and lets you review potentially good matches. Application statuses are also tracked.

Machop – Machop match Polymarket and Kalshi events (fuzzy matched then uses an LLM to exact match), and then surfaces arbitrage opportunities. There used to be trades that were around 50% ROI/year for a couple of hundred dollars in volume, but I’m not sure if you can find that kind of alpha anymore. The video below doesn’t capture the full capabilities of Machop. It was originally entirely in your terminal, but now you can try a read-only version on arbitrage.bwang.io.

Machamp – Trading bot that exploits mispriced sports market by comparing model odds to Kalshi odds. It automatically places trades when it detects sufficient edge. Features a dashboard so you can witness model edge along with game state.

Metapod – A Jackbox TV style party platform where a host displays a game on TV and people join the game on their phones using a 4-letter room code. The first game is a single-player adventure game, “Lost at Sea”. The other game is “Confidently Wrong”, a Wits and Wagers style game. This game is functional but lacking sounds and animations that makes it fun.


Here are a few more, briefly listed.

  • Porygon – Visually steps through a btc transaction using the real Bitcoin testnet and testnet4 faucet. Generates a wallet, see your transaction in a block.
  • Scyther – Surgical video annotation tool for laparoscopic surgery using CholecSeg8k model for segmentation. Data is bad and my machine isn’t powerful enough so it’s not functional.
  • Ludicolo – Inspiration from https://midnight.pub/. Bar simulation where you can sit next to real people at a bar and chat with them (or the bartender who is a LLM).
  • Vulpix – Tiktok content generator. Turns google sheet prompts into tiktok drafts. Not live.
  • Ninetales – Live-stream clipper. Captions clips and turns them into tiktok drafts. Not enough disk space.
  • Tangela – I realized I had an old reddit API key so I used a key to look for trendy topics on reddit.

Software is cheap now. The role of an engineer now is a generalist, a problem solver, a token consumer. Understand technology is implied, but we now just need to increasingly interact with the ends instead of the means.

I will continue writing code and building products in the meantime.

Across playing low stakes live and microstakes online, I have won somewhere around 15k in profit playing poker.

I used to have a lengthier post full of EV calculation and theory. This is a much shorter post, more practical.

should I be playing

I recently came across describing poker tilt with Kahneman’s systems of thinking. The first system is automatic, fast, and emotional. The second system is deliberate, slow, and conscious. We use the first system when we are tilted. We abandon rational thoughts, becoming disillusioned gamblers.

I think about 10% of players are winners in the long run. In a small enough sample, anyone can be winning. Looking at this variance calculator, we’d need hundreds of thousands of hands to get an approximate win rate. And even then, a theoretically winning player could still be losing. Therefore, poker makes an excellent hobby but a poor source of income.

Lastly, money. How much money should I have to play? We can't make any bets if we don't have money. To calculate the Risk of Ruin of poker, I personally vouch for just taking a sample of your last n sessions, getting a mean/variance, and seeing how many standard deviations we are from the mean. We answer questions like “are these games too big for me?”. F(n) = e^(-2nm/v), v=variance, m=mean.

This is a thread from Linus, one of the greatest cash game players of all time, before his ascent to greatness. To me, he just seems indomitable even during his rise, ignoring the naysayers, always inquisitive and enjoying the game. And I think that’s how I think the game should be.

pot odds

People say poker is a game for the math-inclined, but the only equation you need is pot odds.

  • Pot odds is risk/reward or b/(p+2b); this equation is really only applicable for rivers. For example, when villain bets a half pot sized bet, the pot odds is .25. If our call generates a win over a fourth of the time, then it’s printing. Sometimes, a villain will give us a great discrepancy between pot odds and how often we should be calling. These villains are called fish.
  • Much less useful than pot odds, but when constructing a bluff on a river with a range advantage, the theoretical bluff to value ratio should be b/(p+b). I mainly use this equation to keep me at bay from not over-bluffing. For example, a half-pot sized bet should have a bluff to value ratio of 1 to 3. Caveats: we need to have a range advantage, and this equation doesn’t account for villain having traps and raising. Consequences: this is ignoring exploit sizings … the greater the bet size, the greater the EV gained; however, even though theoretically a bigger bet will have more bluffs, when you actually have the nuts, villain is never calling your 3x shove on the river. (This video on trapping frequency is also a relevant watch.)

preflop

  • In a low stack-to-pot ratio situations, we want to have a hand like AQo or 77 for immediate showdown. In a high stack-to-pot ratio situations, we can consider hands like 89s or A7s to cooler/stack someone.
  • We should play more hands in position and less hands out of position. Ideally, in terms of where we sit, we want (1) aggressive players to the right of us so we can react and (2) passive players to the left of us so our aggression can go unchecked.
  • I just checked my preflop values for a site, my VPIP (voluntarily put in pot %) is 26 and my PFR (preflop raise %) is 18. I don’t think these numbers will deviate very much. A low VPIP/PFR is easier to play; the solver tells us a range advantage equals carte blanche to start blasting. You can consider playing even less hands due to rake; I looked at some solver outputs for 5-7x opens in casino rake environments, and there are situations when we should be folding even JJ and TT to a single early position raise.
  • Deviating preflop from optimal solver results is fine if you have a plan. For example, when people behind you only 3bet with premiums, it’s fine to just call in position. I also employ a light 3bet squeeze when there is a raise from a weak range and too many callers, specifically isolating a weak player.

general

There are some wizards of the game, but I think a general sense of how to play optimally is good enough.

  • In general, tight aggressive is the correct style to play.
  • People at my stake tend to call with draws and raise with made hands.
  • Raises on the river are severely under-bluffed.
  • Under-bluffed population lines (4 to a flush, 4 to a straight, double paired boards, etc.) are good bluff lines.
  • Most EV is won by being in position of fish, usually a LAG type of player.
  • There are some BXB (bet flop, check turn, bet river) and BB lines that is going to be profitable against most population villains, especially on boards like dry paired boards (villain unlikely to connect) or blind vs blind (villain range too wide). However, I personally like to be villain specific when going for these red-line exploits.
  • Raises are going to generate more folds than bets. A small raise can be really effective against a weak polar range.
  • Bigger bets is an exploit against villains who over call.
  • Some players will respond to absolute sized bets more so than relative sized bets. E.g. A 300 dollar bet into a 600 dollar pot might be very big for someone. Sometimes on rivers, you might need to size down to get called.
  • I have a smaller sizing when I am trying to be raised or if villain is under-calling. I have a larger sizing to get called by lower equity hands or to generate folds.
  • More important than balance is knowing your image and knowing your opponent.

multiway

Many hands are played multi-way with nonstandard sizing.

  • Use smaller bet sizing and bluff less in general multi-way. Against a half pot bet against 1 other player, we should call 66%. Against 5 players, we should call 20% of the time. Realistically, this number ought to be less. Half pot is big multi-way, but over-fold against even bigger bets.
  • If we have a very strong hand and the board is likely to be bet, we just want to check (to raise) to cooler someone. If it’s unlikely to be bet, we want to bet ourselves. If we have a good hand (but not a very strong hand), we can consider betting ourselves so we can react to a raise. In general, a lot of fishy villains telegraph hand strength from bet sizing or raises. You want to play a reactionary game against fishy players. Let them act, and then you get to react almost perfectly.
  • The person immediately calling a bet in a multi-way pot has a strong range. Given most situations, I usually just fold middle pair or worse.
  • Obviously, shift your range to be value heavy against villain who overcall. But since people over-fold to raises, it’s often good to check raise when we block a really strong hand on boards where villain is capped or doesn’t want to put more money in.

Actual strategies are villain specific and more nuanced. But this should be a good start.

live tells

I have been getting better at live tells. Here are some examples:

  • Villain reaching for chips defensively when you start to bet is weak.
  • A fishy player might say something regarding how weak they are. “I’m on a draw”, “I was afraid you’d snap call” are all signs of strength.
  • A player making a nervous move when they think you are not watching is much more indicative of weakness than if they act in an obvious manner. People know they are being watched after they make a big bet so you have to look for tells they can’t control (still palpitating after a few minutes might indicate a bluff, etc.).
  • Never believe what people have unless they show.

other

  • Be aware of collaboration. When two players always enter the pot together and there are suspicious betting patterns (to bully a third person out of pot or to maximize value of a player’s nutted hand to get multiple players to call). I think seeing showdown hands played in ways that don’t make sense can give you signs of collaboration.
  • Be aware of cheating; I put cheating in a different category from collaboration. I have been cheated twice in home games (once in Asia and once in NYC) due to some sort of dealer mechanics or marked cards. I don’t play in home games anymore.
  • PT4 is pretty good software to collect your hands. There are nice discord channels you can find to talk about hands.