C. E. Shannon, Coding theorems for a discrete source with a fidelity criterion, pp.325-350, 1993.

K. Rose, Deterministic annealing for clustering, compression, classification, regression, and related optimization problems, Proceedings of the IEEE, vol.86, issue.11, pp.2210-2239, 1998.

R. Dobrushin and B. Tsybakov, Information transmission with additional noise, IRE Transactions on Information Theory, vol.8, issue.5, pp.293-304, 1962.

N. Tishby, F. C. Pereira, and W. Bialek, The information bottleneck method, Proceedings of the 37-th Annual Allerton Conference on Communication, Control and Computing, pp.368-377, 1999.

S. Boyd and L. Vandenberghe, Convex Optimization, 2004.

K. P. Murphy, Machine learning: a probabilistic perspective, 2012.

H. Witsenhausen and A. Wyner, A conditional entropy bound for a pair of discrete random variables, Information Theory, IEEE Transactions on, vol.21, pp.493-501, 1975.

N. Slonim and N. Tishby, The power of word clusters for text classification, 23rd European Colloquium on Information Retrieval Research (ECIR), pp.1-12, 2001.

N. Slonim, R. Somerville, N. Tishby, and O. Lahav, Objective classification of galaxy spectra using the information bottleneck method, Monthly Notes of the Royal Astronomical Society, vol.323, pp.270-284, 2001.

R. M. Hecht, E. Noor, and N. Tishby, Speaker recognition by gaussian information bottleneck, INTERSPEECH. ISCA, pp.1567-1570, 2009.

M. Vera, L. R. Vega, and P. Piantanida, The two-way cooperative information bottleneck, IEEE International Symp. on Information Theory, pp.2131-2135, 2015.

N. Tishby and N. Zaslavsky, Deep learning and the information bottleneck principle, 2015 IEEE Information Theory Workshop, pp.1-5, 2015.

M. Vera, L. R. Vega, and P. Piantanida, Collaborative representation learning, 2016.

G. Pichler, P. Piantanida, and G. Matz, Distributed informationtheoretic biclustering, CoRR, 2016.

Q. Yang, P. Piantanida, and D. Gunduz, The multi-layer information bottleneck problem, Information Theory Workshop, p.2017
URL : https://hal.archives-ouvertes.fr/hal-01742326

, IEEE, 2017.

S. Arimoto, An algorithm for computing the capacity of arbitrary discrete memoryless channels, IEEE Trans. Inform. Theory, vol.18, issue.1, pp.14-20, 1972.

R. Blahut, Computation of channel capacity and rate-distortion functions, IEEE Trans. Inform. Theory, vol.18, issue.4, pp.460-473, 1972.

F. M. Willems, Computation Wyner-Ziv rate-distortion function, ser, 1983.

G. Chechik and N. Tishby, Extracting relevant structures with side information, Advances in Neural Information Processing Systems 15 [Neural Information Processing Systems, NIPS 2002, pp.857-864, 2002.

G. Kumar and A. Thangaraj, Computation of secrecy capacity for more-capable channel pairs, Information Theory (ISIT), 2008.

K. Yasui, T. Suko, and T. Matsushima, An algorithm for computing the secrecy capacity of broadcast channels with confidential messages, Information Theory (ISIT), 2007.

K. Yasui and T. Matsushima, Toward computing the capacity region of degraded broadcast channel, Information Theory (ISIT), 2010.

R. Caruana, Multitask learning, Machine Learning, vol.28, pp.41-75, 1997.

J. Baxter, A model of inductive bias learning, Journal of Artificial Intelligence Research, vol.12, pp.149-198, 2000.

Y. Zhang and Q. Yang, A Survey on Multi-Task Learning, 2017.

R. K. Ando and T. Zhang, A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data, Journal of Machine Learning Research, vol.6, pp.1817-1853, 2005.

C. Ciliberto, Y. Mroueh, T. Poggio, and L. Rosasco, Convex Learning of Multiple Tasks and their Structure, 2015.

A. Argyriou, T. Evgeniou, and M. Pontil, Convex Multi-task Feature Learning, Mach. Learn, vol.73, issue.3, pp.243-272, 2008.

M. L. Zhang and Z. H. Zhou, A Review on Multi-Label Learning Algorithms, IEEE Transactions on Knowledge and Data Engineering, vol.26, issue.8, pp.1819-1837, 2014.

A. D. Wyner and J. Ziv, The rate-distortion function for source coding with side information at the decoder, IEEE Trans. Inform. Theory, vol.22, pp.1-10, 1976.

M. Li and P. Vitanyi, An introduction to kolmogorov complexity and its applications: Preface to the first edition, 1997.

T. M. Cover and J. A. Thomas, Series in Telecommunications and Signal Processing, 2006.

R. T. Rockafellar, Convex Analysis, 1970.

A. E. Gamal and Y. Kim, Network Information Theory, 2012.

R. Gallager, Information Theory and Reliable Communication, 1968.

O. Shamir, S. Sabato, and N. Tishby, Learning and generalization with the information bottleneck, Theor. Comput. Sci, vol.411, pp.2696-2711, 2010.

R. Shwartz-ziv and N. Tishby, Opening the black box of deep neural networks via information, CoRR, 2017.

I. S. Dhillon, S. Mallela, and R. Kumar, A divisive information theoretic feature clustering algorithm for text classification, J. Mach. Learn. Res, vol.3, pp.1265-1287, 2003.

A. Vinokourov and M. Girolami, A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections, Journal of Intelligent Information Systems, vol.18, issue.2-3, pp.153-172, 2002.

C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), 2006.

K. Lang, Newsweeder: Learning to filter netnews, Proceedings of the 12th International Machine Learning Conference (ML95), 1995.