Table of Contents

Winter 2015 Reading Group

Time: Thursday @ 10AM to 12PM (noon)

Place: DBH 4013

Guidelines

At least one week prior to your presentation, please fill out the papers/topics that you plan to present at the meeting.

Schedule

Week 1 - James - Jan 8th (10AM to 12PM) - DBH 4013

Topic:Determinantal point processes for machine learning

Abstract:Determinantal point processes (DPPs) are elegant probabilistic models of repulsion
that arise in quantum physics and random matrix theory. In contrast to traditional structured models like Markov random fields, which become intractable and hard to approximate in the presence of negative correlations, DPPs offer efficient and exact algorithms for sampling, marginalization, conditioning, and other inference tasks. We provide a gentle introduction to DPPs, focusing on the intuitions, algorithms, and extensions that are most relevant to the machine learning community, and show how DPPs can be applied to real-world applications like finding diverse sets of high-quality search results, building informative summaries by selecting diverse sentences from documents, modeling non-overlapping human poses in images or video, and automatically building timelines of important news stories.

Week 2 -Olga Russakovsky - Jan 15th (4011 DBH @ 11AM)

Paper: Designing and Overcoming Challenges in Large-Scale Object Detection

Abstract: There are four key components when tackling large-scale visual recognition: (1) the data needs to have sufficient variety to represent the target world, (2) the algorithms need to be powerful enough to learn from this data, (3) the data needs to be annotated with enough information for algorithms to learn from, and (4) the algorithms need to be fast enough to process this large data. I’ll talk about my PhD work on each of these four components in the context of large-scale object detection.

The main part of the talk will focus on the difficulties of collecting and annotating diverse data when designing the object detection task of the ImageNet Large Scale Visual Recognition Challenge (http://image-net.org/challenges/LSVRC/). ILSVRC is a benchmark in object classification and detection on hundreds of object categories and millions of images. The challenge has been run annually since 2010, attracting participation from more than fifty institutions. In 2014 the challenge had a record number of submissions (123 entries from 36 teams) and appeared in international media including the New York Times, MIT Technology Review and CBC/Radio-Canada. In this talk I’ll describe some of the challenges of scaling up the object detection task of ILSVRC by more than an order of magnitude compared to previous datasets (e.g., the PASCAL VOC).

I will conclude by discussing my current work and future plans. As computers are becoming exceedingly good at recognizing many object categories, I argue that it’s time to step back and consider the long tail: what pieces are still missing and preventing us from recognizing /every/ object and understanding /every/ pixel in an image? I believe that we’ll need to closely examine the performance of our algorithms on individual classes (instead of focusing on average accuracy across hundreds of categories), revisit the idea of object description rather than categorization, and consider the advantages of close human-machine collaboration.

Bio: Olga Russakovsky (http://ai.stanford.edu/~olga) is a PhD student at Stanford University advised by Professor Fei-Fei Li. Her main research interests are in large-scale object detection and recognition. For the past two years she has been the lead organizer of the international ImageNet Large Scale Visual Recognition Challenge which has been featured in the New York Times, MIT Technology Review, and other international media venues. She has organized several workshops at top-tier computer vision conferences: the ImageNet challenge workshop at ICCV’13 and ECCV’14, the upcoming workshop on Large-Scale Visual Recognition and Retrieval at CVPR’15, and the new Women in Computer Vision workshop at CVPR’15. During her PhD she collaborated closely with NEC Laboratories America and with Yahoo! Research Labs. She was awarded the NSF Graduate Fellowship and the CRA undergraduate research award.

Week 3 - Bailey - Jan 23rd

Paper:

Abstract:

Week 4 -Sam - Jan 29th

Topic:

Abstract:

Week 5 - Shu - Feb 5th

Paper: Beyond R-CNN detection: Learning to Merge Contextual Attribute

Abstract: We will briefly review the R-CNN [1], which actually does classification over thousands of objectness regions extracted from the image. We will see what it missed – interaction between objects and context within the image. When people make use contextual information in addition to CNN, performance is improved [2]. This is also recently supported by an interesting study [3], which compares the action classification performance between state-of-the-art CV methods and linear SVM over the fMRI data. The conclusions in the paper are very interesting, but we emphasize the most “trivial” yet convincing one – human brain exploits semantic inference for action classification, which is absent in CV methods for action classification. So, exploiting the contextual information will be a reasonable step to improve detection. But how can we represent, extract and utilize the contextual information? To answer these questions, I will present two other papers which are seemingly unrelated to the questions. The first one is [4], which presents how to represent/learn/use texture attribute to improve texture and material classification; the second one is [5] which uses patch match techniques for chair detection in a finer way. Based on these two papers, we will try to answer the questions – how can we represent, learn and use the contextual information to boost detection?

Week 6 -Minhaeng - Feb 12th

Paper: Knowing a good HOG filter when you see it: Efficient selection of filters for detection

Abstract:http://ttic.uchicago.edu/~smaji/papers/goodParts-eccv14.pdf

Week 7 - Phuc - Feb 19th @ 10AM

Paper:

Abstract:

Week 7 - Yi - Feb 19th @ 5PM in DBH 4013

Paper: Deep learning!

Abstract:

Week 8 - Peiyun - Feb 26th

Paper: Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Abstract:

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or “temporally deep”, are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are “doubly deep” in that they can be compositional in spatial and temporal “layers”. Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

Week 9 - Raul - Mar 5th

Paper:

Abstract:

Week 10 - Greg - Mar 12th

Paper:

Abstract:

Week 11 - Mohsen - Mar 19th

Paper:

Abstract: