Machine-Visioning Match Game With Anthropic’s API

Machine-Visioning Match Game With Anthropic’s API

When I first started learning JavaScript two and a half years ago, I had a couple search tools I definitely knew I wanted to make. Wouldn’t it be cool, I thought, if I could manage to think of five tools to make, or maybe, maybe even ten. And as I went along, every time I finished a tool I thought to myself “Oh no, that was the last one, I will never think of another one,” until a couple days later when I did think of another one and kept on going.

And now between all my various doings (see Project List for details) I have several dozen dibs and dabs available and bits of scratch paper containing more ideas and when I get that thought in my head I tell it to shut up.

But sometimes I do get feeling a little stuck, like I’m going over the same ground over and over again and not learning much as I make something. When I get in that cul-de-sac I like to look for something really stupid to do that will help me learn new skills. Like, 100% dumb. The stakes could not be lower, so enjoy yourself.

What better choice for such a context than a 1970s game show?

“Old Man Periwinkle Said…”

Match Game has had several incarnations but the one I’m referring to was a 1970s game show hosted by Gene Rayburn. It was very vulgar and funny and so, so, so, 1970s. Don’t ask me to justify the fashion, I got nothing.

Anyway, the way the game worked was you had a couple of contestants:

Two contestants sitting at the Match Game contestant desk. One desk has a red circle and the other has a green triangle. Besides them, Gene Rayburn is standing talking to them, holding the skinny microphone which was his trademark.

(By the way, the video these screenshots came from is from the MG Productions YouTube Channel.) They are awesome and you should subscribe to them.)

The contestants were asked fill-in-the-blank questions which were rife with risque possibilities, and had to match their answers to that of a celebrity panel:

Celebrity panel from a 1976 issue of Match Game. Celebrities are Ed Asner, Brett Somers, Charles Nelson Reilly, Debralee Scott, Richard Dawson, and Patti Deutsch.

In between there was a lot of messing around, celebrities promoting their products, etc. The panel members had lunch in the middle of day (the shows were shot a week at the time) and there was a lot of alcohol available; this is apparent in some of the episodes.

Anyway, the celebrities wrote their answers on blue cards and held them up to the camera and I thought, “That would be an interesting use of the machine vision part of Anthropic‘s API and I could learn a bit more about building JSONs and also this is an opportunity for some silly pop culture analysis so away we go.”

Finding the Blue Cards

Screenshot from Match Game productions. It shows Patti Deutsch, usually in the last position on the panel , showing her blue card which has an answer (potential match) written on it. It reads ORTHOPEDIC SHOES.

The first thing I did was use FFmpeg to slice a test episode of Match Game into component images, one per second of video so I had something the Anthropic API could evaluate. (One episode at one second per image is about 2000 images.) Then I tried two approaches and went with the second one.

With the first approach, the Anthropic API machine vision made one pass through a group of test images to identify those with blue cards. Then it made a second pass through the blue card group to identify the cards with the best placement to be readable. Finally on the third pass it applied OCR to the readable cards.

That method worked really well but I decided not to use it. First, Anthropic’s API is not free and three iterations of the same group of images is expensive, especially if you’re just eliminating cards and not getting any information out of them. Second, the iterations didn’t seem as necessary as I thought they might be; the lighting of the images is good since it’s a TV show and the celebrities turn their cards so they’re directly facing the camera. Even when an image captures a card in mid-turn, the card is usually held in place long enough for a subsequent image to get a good view.

Instead, I decided to do a single iteration of each set of images. And since I was doing only a single iteration I decided to gather more information than what was simply on the blue cards.

Gathering Data From the Images

I know I want the text on the blue cards. What other information do I want to gather from these images?

Anthropic’s API is face blind. That is to say, it will not identify people in images. As far as I can tell it won’t even gender people. So I can’t start with asking the API to identify the person in the image if possible. However, I can ask it to look for one of the nameplates that each celebrity had in front of them:

Screenshot from Match Game. This image shows Debralee Scott and Richard Dawson sitting side by side. Debralee looks like she's laughing and Richard Dawson is yelling at the camera. They both hold blue cards reading "Finishing School" and their nameplates are clearly visible in front of them.

And as long as I’m here, why don’t I gather up some information about what people are wearing, since the 1970s were a different time when men would wear plaid suits in public and it was fine? (It’s still fine with me — as long as your bits are covered I don’t give a damn what you wear.)

Gene Rayburn wearing this sandy tannish orange suit with a faint blue and peach plaid, a taupe vest, and an orangey-red tie.

Crafting the Anthropic Prompt

I ended up writing a really long prompt for what I want the Anthropic API to do for each image. It gathers the following:

  • A boolean value for hasBlueCard
  • Information about visible cards and their texts in a cardPositions array
  • Information about visible nameplates and their texts in a visibleNameplates array
  • A ridiculous amount of information about what each person is wearing in a fashion array.

Does it Work?

It works pretty good! Let’s take a look at that picture of Debralee Scott and Richard Dawson again:

Screenshot from Match Game. This image shows Debralee Scott and Richard Dawson sitting side by side. Debralee looks like she's laughing and Richard Dawson is yelling at the camera. They both hold blue cards reading "Finishing School" and their nameplates are clearly visible in front of them.

Here’s how the machine vision analysis of this image turned out, starting with blue card data:

“hasBlueCard”: true,\n’ +
‘ “cardPositions”: [\n’ +
‘ {\n’ +
‘ “position”: “upper_left”,\n’ +
‘ “text”: “school”,\n’ +
‘ “textConfidence”: 1.0,\n’ +
‘ “isBlueCard”: true\n’ +
‘ },\n’ +
‘ {\n’ +
‘ “position”: “upper_right”, \n’ +
‘ “text”: “lid?”,\n’ +
‘ “textConfidence”: 0.9,\n’ +
‘ “isBlueCard”: true\n’ +
‘ }\n’ +
‘ ],\n’ +

It didn’t OCR the cards very well, but considering the handwriting I’m not too surprised. How about the nameplates?

“visibleNameplates”: [\n’ +
‘ {\n’ +
‘ “position”: “lower_left”,\n’ +
‘ “text”: “DEBRA LEE”,\n’ +
‘ “textConfidence”: 1.0\n’ +
‘ },\n’ +
‘ {\n’ +
‘ “position”: “lower_right”,\n’ +
‘ “text”: “RICHARD”,\n’ +
‘ “textConfidence”: 1.0\n’ +
‘ }\n’ +
‘ ],\n’ +

Almost perfect! It added a space to DEBRALEE, I suspect because that’s a more traditional spelling. Finally, what does the fashion portion of the JSON response look like?

“fashion”: [\n’ +
‘ {\n’ +
‘ “position”: “lower_left”,\n’ +
‘ “garmentDescription”: “A floral print dress with long sleeves”,\n’ +
‘ “patterns”: [“floral”],\n’ +
‘ “colors”: [“brown”, “orange”, “yellow”],\n’ +
‘ “distinctiveFeatures”: [“wide collar”],\n’ +
‘ “accessories”: [],\n’ +
‘ “eraSpecificElements”: [“bold floral print”, “long sleeves”],\n’ +
‘ “visible_portion”: “full body”,\n’ +
‘ “confidence”: 0.9\n’ +
‘ },\n’ +
‘ {\n’ +
‘ “position”: “lower_right”,\n’ +
‘ “garmentDescription”: “A brown suit jacket over a light-colored shirt”,\n’ +
‘ “patterns”: [],\n’ +
‘ “colors”: [“brown”, “tan”],\n’ +
‘ “distinctiveFeatures”: [“wide lapels”],\n’ +
‘ “accessories”: [],\n’ +
‘ “eraSpecificElements”: [“wide lapels”, “earth tones”],\n’ +
‘ “visible_portion”: “upper body”,\n’ +
‘ “confidence”: 0.8\n’ +
‘ }\n’ +
‘ ]\n’ +

That worked really well, especially since I never intended to gather that information in the first place!

A Couple of Bumps

I have hit a couple of snags. You might have noticed that the screenshots have a lot of old watermarks on them from the original GSN airing, and then an additional mark from Match Game Productions. That is sometimes confusing the API when looking for a blue card, so I need to tighten the parameters on what a blue card is.

Also if I analyzed the entire series of Match Game using the Anthropic API it would be really, really expensive. I have managed to restrict the API costs to a grand total of 58 cents while getting this program working, but that would change a lot if I tried to analyze an entire season.

Still, I find myself intrigued by the information gathered by the API and I’d like generate a large dataset so I can splash around in it and see what happens. I might take a half-dozen episodes from a single season and bake them into a large dataset for further exploration later. Stay tuned.

Back To Top