CoTracker: It is Better to Track Together

The article introduces CoTracker, a new architecture that simultaneously tracks multiple points throughout an entire video. Unlike other methods that track points individually, CoTracker takes into account the correlation between points, such as those originating from the same physical object. This architecture, based on a transformer network, incorporates ideas from the optical flow and tracking literature into a flexible and powerful design.

The transformer in CoTracker is designed to iteratively update an estimate of several trajectories and can be applied to very long videos in a sliding-window manner. The authors have developed an unrolled training loop for this purpose. The CoTracker outperforms other point tracking methods in terms of efficiency and accuracy, demonstrating its potential in video motion prediction.

Key takeaways:

The paper introduces CoTracker, an architecture that tracks multiple points throughout an entire video, improving upon methods that track points individually.
CoTracker is based on a transformer network that models the correlation of different points in time using specialised attention layers.
The transformer is designed to iteratively update an estimate of several trajectories and can be applied to very long videos in a sliding-window manner.
CoTracker performs favorably against other point tracking methods in terms of efficiency and accuracy.

CoTracker: It is Better to Track Together

Key takeaways:

Comments (0)

Newsletter