Dense Captioning of Video Demonstrating the Upgraded Boston Dynamics Atlas Robot

Artist and programmer Gene Kogan ran the Boston Dynamics video demonstrating their upgraded Atlas robot through the Densecap captioning system, which tries to identify objects in a video. The system is both impressive and at times wildly inaccurate, labeling the robot in the resulting video as a variety of incorrect things like a person skiing, a motorcycle, or a fire hydrant.

Captions are generated by densecap on individual video frames. The video is made by a python script which merges matching captions along sequences of consecutive frames with a set of (mostly greedy) heuristics. Presumably, it would be possible to caption sequences of regions directly rather than a naive merging algorithm, but Iā€™m not sure how šŸ™‚

via Prosthetic Knowledge

Help Laughing Squid grow with a monthly pledge of support.

What do you think?

0 points
Upvote Downvote

Total votes: 0

Upvotes: 0

Upvotes percentage: 0.000000%

Downvotes: 0

Downvotes percentage: 0.000000%