This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
groupmeeting-fall2014 [2014/10/13 17:10] bhkong [Week 2 - Oct 16th - Bailey - DBH4013] |
groupmeeting-fall2014 [2014/12/10 12:08] (current) jsupanci [Week 11 - Dec 18th - James - DBH4011] |
||
---|---|---|---|
Line 3: | Line 3: | ||
Time: Thursday 3:30-5:30pm. | Time: Thursday 3:30-5:30pm. | ||
- | Place: mostly in DBH4011, except for week 2 and 3. I will send out reminder for this. | + | Place: DBH3013 |
===== Guidelines ===== | ===== Guidelines ===== | ||
Line 16: | Line 16: | ||
**Abstract**: Automatic analysis of facial expression in a realistic scenario is a difficult problem due to that the 2-D imagery of human facial expression consists of rigid head motion and non-rigid muscle motion. We are tasked to solve this “coupled-motion” problem and analyze facial expression in a meaningful manner. We first proposed an image-based representation, Emotion Avatar Image, to help person-independent expression recognition. This method allows us to analyze facial expression in a canonical space, which makes the comparison of corresponding features more accurate and reasonable. Second, an real-time registration technique is designed to improve frame-based streaming facial action unit (AU) recognition. We do not always have the luxury of obtaining the temporal segmented discrete facial expressions, e.g., joy or surprise. The project introduces a frame-based method for registration. It not only aligns faces (or objects in general) to a reference, but also guarantees temporal smoothness, both of which are essential for spontaneous expression analysis. Third, the proposed accurate expression recognition techniques are then applied to the field of advertising, where facial expression is demonstrated to be closely correlated with the commercial viewing behavior of audiences. | **Abstract**: Automatic analysis of facial expression in a realistic scenario is a difficult problem due to that the 2-D imagery of human facial expression consists of rigid head motion and non-rigid muscle motion. We are tasked to solve this “coupled-motion” problem and analyze facial expression in a meaningful manner. We first proposed an image-based representation, Emotion Avatar Image, to help person-independent expression recognition. This method allows us to analyze facial expression in a canonical space, which makes the comparison of corresponding features more accurate and reasonable. Second, an real-time registration technique is designed to improve frame-based streaming facial action unit (AU) recognition. We do not always have the luxury of obtaining the temporal segmented discrete facial expressions, e.g., joy or surprise. The project introduces a frame-based method for registration. It not only aligns faces (or objects in general) to a reference, but also guarantees temporal smoothness, both of which are essential for spontaneous expression analysis. Third, the proposed accurate expression recognition techniques are then applied to the field of advertising, where facial expression is demonstrated to be closely correlated with the commercial viewing behavior of audiences. | ||
- | |||
==== Week 2 - Oct 16th - Bailey - DBH4013 ==== | ==== Week 2 - Oct 16th - Bailey - DBH4013 ==== | ||
Line 23: | Line 22: | ||
**Abstract:** Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a large-scale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in large-scale settings, and is well suited to supervised and semi-supervised classification, as well as regression tasks for data that admit sparse representations. | **Abstract:** Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a large-scale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in large-scale settings, and is well suited to supervised and semi-supervised classification, as well as regression tasks for data that admit sparse representations. | ||
- | http://arxiv.org/pdf/1009.5358 | + | [[http://arxiv.org/pdf/1009.5358|http://arxiv.org/pdf/1009.5358]] Slides: {{:task-driven_dictionary_learning.pdf|task-driven_dictionary_learning.pdf}} |
==== Week 3 - Oct 23th - Raúl - DBH 4013 ==== | ==== Week 3 - Oct 23th - Raúl - DBH 4013 ==== | ||
- | Paper: | + | **Paper**: NYC3DCars: A Dataset of 3D Vehicles in Geographic Context |
+ | |||
+ | **Abstract**: Geometry and geography can play an important role in recognition tasks in computer vision. To aid in studying connections between geometry and recognition, we introduce NYC3DCars, a rich dataset for vehicle detection in urban scenes built from Internet photos drawn from the wild, focused on densely trafficked areas of New York City. Our dataset is augmented with detailed geometric and geographic information, including full camera poses derived from structure from motion, 3D vehicle annotations, and geographic information from open resources, including road segmentations and directions of travel. NYC3DCars can be used to study new questions about using geometric information in detection tasks, and to explore applications of Internet photos in understanding cities. To demonstrate the utility of our data, we evaluate the use of the geographic information in our dataset to enhance a parts-based detection method, and suggest other avenues for future exploration. | ||
+ | |||
+ | \\ | ||
+ | [[http://nyc3d.cs.cornell.edu/static/paper.pdf|nyc3d.cs.cornell.edu/static/paper.pdf]] | ||
==== Week 4 - Oct 30th - Peiyun - DBH 4011 ==== | ==== Week 4 - Oct 30th - Peiyun - DBH 4011 ==== | ||
- | Paper: | + | Papers: |
+ | |||
+ | Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation \\ | ||
+ | //Jonathan Tompson, Arjun Jain, Yann LeCun, Christoph Bregler// \\ | ||
+ | [[http://arxiv.org/abs/1406.2984|http://arxiv.org/abs/1406.2984]] | ||
+ | |||
+ | Deformable Part Models are Convolutional Neural Networks \\ | ||
+ | Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik \\ | ||
+ | [[http://arxiv.org/abs/1409.5403|http://arxiv.org/abs/1409.5403]] | ||
==== Week 5 - Nov 6th - Shu - DBH4011 ==== | ==== Week 5 - Nov 6th - Shu - DBH4011 ==== | ||
Line 49: | Line 61: | ||
==== Week 9 - Dec 4th - Sam - DBH4011 ==== | ==== Week 9 - Dec 4th - Sam - DBH4011 ==== | ||
- | Paper: | + | **Paper**: Local Decorrelation for Improved Pedestrian Detection |
- | ==== Week 10 - Dec 11th - Greg - DBH4011 ==== | + | **Abstract**: Even with the advent of more sophisticated, data-hungry methods, boosted decision trees remain extraordinarily successful for fast rigid object detection, achieving top accuracy on numerous datasets. While effective, most boosted detectors use decision trees with orthogonal (single feature) splits, and the topology of the resulting decision boundary may not be well matched to the natural topology of the data. Given highly correlated data, decision trees with oblique (multiple splits) can be effective. Use of oblique splits, however, comes at considerable computational expense. Inspired by recent work on discriminative decorrelation of HOG features, we instead propose an efficient feature transform that removes correlations in local neighborhoods. The result is an overcomplete but locally decorrelated representation ideally suited for use with orthogonal decision trees. In fact, orthogonal trees with our locally decorrelated features outperform oblique trees trained over the original features at a fraction of the computational cost. The overall improvement in accuracy is dramatic: on the Caltech Pedestrian Dataset, we reduce false positives nearly tenfold over the previous state-of-the-art. |
- | Paper: | + | [[http://vision.ucsd.edu/~pdollar/files/papers/NamNIPS14ldcf.pdf|vision.ucsd.edu/~pdollar/files/papers/NamNIPS14ldcf.pdf]] |
+ | |||
+ | ==== Week 10 - Dec 11th - James - DBH4011 ==== | ||
+ | |||
+ | Paper: Filter Forests for learning Data-Dependent Convolutional Kernels | ||
+ | |||
+ | Abstract: We propose ‘filter forests’ (FF), an efficient new discrimi-\\ | ||
+ | native approach for predicting continuous variables given a signal and its context. FF can be used for general signal restoration tasks that can be tackled via convolutional filter- ing, where it attempts to learn the optimal filtering kernels to be applied to each data point. The model can learn both the size of the kernel and its values, conditioned on the ob- servation and its spatial or temporal context. We show that FF compares favorably to both Markov random field based and recently proposed regression forest based approaches for labeling problems in terms of efficiency and accuracy. In particular, we demonstrate how FF can be used to learn optimal denoising filters for natural images as well as for other tasks such as depth image refinement, and 1D signal magnitude estimation. Numerous experiments and quanti- tative comparisons show that FFs achieve accuracy at par or superior to recent state of the art techniques, while being several orders of magnitude faster | ||
+ | |||
+ | http://research.microsoft.com/pubs/217099/CVPR2014ForestFiltering.pdf | ||
- | ==== Week 11 - Dec 18th - James - DBH4011 ==== | + | ==== Week 11 - Dec 18th - Greg - DBH4011 ==== |
Paper: | Paper: | ||