boston harbor

CVPR from a Robotic Vision Perspective

The CVPR conference in Boston, one of the premier computer vision conferences, was all about convolutional neural network and deep learning. This new (or not so new) techniques seem to be doing everything from image classification to scene understanding. Although the vision community has not shown too much of an interest in robotic applications, I had a feeling that this seems to change (slowly at least).

tl;dr: CVPR is huge, lots of convolutional neural network, which is now the de-facto standard on how to tackle computer vision problems. CV research is getting more easily to reproduce thanks to open source code AND models. There is a trend to investigate more what is behind these networks and also a trend to look at more robotic (real-world) applications of vision.
My longer write-up of #CVPR2015 is after the break. Others have done similar things: a great write-up Tomasz Malisiewicz,  another one by Zoya Bylinskii listing interesting CVPR 2015 papers.

One of the clearly visible impacts of the extensive use of computer vision techniques in industry was the presence of quite a few internet heavy-weights, such as Amazon, Baidu, Google, Facebook, Microsoft, Tesla, … (Google has a blog post listing their involvement in CVPR papers, workshops and tutorials.) Seems there is a demand for computer vision researchers. The converence itself was split into tutorials (1 day), conference (3d) and workshops (2d).

Plenaries

The highly anticipated plenary talk by Yann LeCun, about what is wrong with deep learning, was though interesting a bit of a let-down, especially given the intriguing title. The second plenary was given by Jack Gallent a neuroscientist from UC Berkeley. His talk featured the work in his lab, which focusses on understanding, especially the mid-level parts of, human visual perception by using fMRI scans.

Tutorials

The tutorials followed this trend and the rooms were bursting during the presentations of caffe, a deep learning library, and torch7, a machine learning library originating at IDIAP and now heavily used at Google, Facebook, Twitter and others. Presenters slides can be found on the respective webpages (caffetorch7).

Interesting Papers

This is a list of interesting papers and posters that I found. The program listing all papers can be found here.

Papers that sound interesting

Limited time means I have not read all of the following ones, but they sound interesting :) (this might change once I get to read more). I tried to categorize them a little (action recognition is obviously quite interesting for us in robotics).

Vision and Actions


Grasp Type Revisited: A Modern Perspective on a Classical Feature for Vision


How Do We Use Our Hands? Discovering a Diverse Set of Common Grasps


Learning a Non-Linear Knowledge Transfer Model for Cross-View Action Recognition


Watch-n-Patch: Unsupervised Understanding of Actions and Relations


Joint Tracking and Segmentation of Multiple Targets


Finding Action Tubes


Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition


End-to-End Integration of a Convolution Network, Deformable Parts Model and Non-Maximum Suppression


Pose-Conditioned Joint Angle Limits for 3D Human Pose Reconstruction


Understanding Tools: Task-Oriented Object Modeling, Learning and Recognition


Towards Force Sensing From Vision: Observing Hand-Object Interactions to Infer Manipulation Forces


Can Humans Fly? Action Understanding With Multiple Classes of Actors


Unsupervised Learning of Complex Articulated Kinematic Structures Combining Motion and Skeleton Information


Elastic Functional Coding of Human Actions: From Vector-Fields to Latent Variables


Motion Part Regularization: Improving Action Recognition via Trajectory Selection


Action Recognition With Trajectory-Pooled Deep-Convolutional Descriptors


First-Person Pose Recognition Using Egocentric Workspaces


Learning To Look Up: Realtime Monocular Gaze Correction Using Machine Learning


Human Action Segmentation With Hierarchical Supervoxel Consistency

Video and Actions


Modeling Video Evolution for Action Recognition


Watch and Learn: Semi-Supervised Learning for Object Detectors From Video


Ego-Surfing First-Person Videos


Nested Motion Descriptors


Dynamically Encoded Actions Based on Spacetime Saliency


Superpixel-Based Video Object Segmentation Using Perceptual Organization and Location Prior


ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding


Classifier Based Graph Construction for Video Segmentation


Pooled Motion Features for First-Person Videos


Joint Action Recognition and Pose Estimation From Video


Fast Action Proposals for Human Action Detection and Search


A Discriminative CNN Video Representation for Event Detection


The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose


Learning to Segment Moving Objects in Videos


Learning to Detect Motion Boundaries

Object Detection/Segmentation/Recognition


Hypercolumns for Object Segmentation and Fine-Grained Localization


Is Object Localization for Free? – Weakly-Supervised Learning With Convolutional Neural Networks


Learning Coarse-to-Fine Sparselets for Efficient Object Detection and Scene Classification


JOTS: Joint Online Tracking and Segmentation


DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection


Enriching Object Detection With 2D-3D Registration and Continuous Viewpoint Estimation


Virtual View Networks for Object Reconstruction


Recurrent Convolutional Neural Network for Object Recognition


Learning to Segment Under Various Forms of Weak Supervision

Understanding DeepNets


Understanding Deep Image Representations by Inverting Them


DEEP-CARVING: Discovering Visual Attributes by Carving Deep Neural Nets


On the Relationship Between Visual Attributes and Convolutional Networks


Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images


Going Deeper With Convolutions


Understanding Image Representations by Measuring Their Equivariance and Equivalence


A Dynamic Programming Approach for Fast and Robust Object Pose Recognition From Range Images


Discovering States and Transformations in Image Collections

Image Captions


Deep Visual-Semantic Alignments for Generating Image Descriptions


From Captions to Visual Concepts and Back


Long-Term Recurrent Convolutional Networks for Visual Recognition and Description

other


Curriculum Learning of Multiple Tasks


Separating Objects and Clutter in Indoor Scenes


Adaptive Eye-Camera Calibration for Head-Worn Devices


Toward User-Specific Tracking by Detection of Human Shapes in Multi-Cameras


Unsupervised Object Discovery and Localization in the Wild: Part-Based Matching With Bottom-Up Region Proposals


Sketch-Based 3D Shape Retrieval Using Convolutional Neural Networks


Fully Convolutional Networks for Semantic Segmentation


Viewpoints and Keypoints


ConceptLearner: Discovering Visual Concepts From Weakly Labeled Image Collections


Large-Scale Damage Detection Using Satellite Imagery


Towards Open World Recognition

Workshops

The workshop, in my opinion, were really the most interesting part of the conference. Lively discussions were ubiquitous, great ideas were presented and there was even room for robotics!!

tba

Loading Facebook Comments ...

One thought on “CVPR from a Robotic Vision Perspective”

Leave a Reply to @DeepLearningRus Cancel reply

Your email address will not be published. Required fields are marked *