You are here

Panel: Big -- The Value of Data

Panel will be held in the combined room Alto & Mezzo & Tempo on the first floor.

Where the conference offers in-depth talks about scalable methods for the analysis of big data, the panel brings together diverse facets of gathering, cleaning, exploring, and analyzing data. Data is no longer considered to be merely the ground, in which we are mining for interesting information. Instead, the data itself is considered to have a high value -- being like a raw diamond, which only needs to be polished.

The panel will discuss the following facets:

  1. Transparency and Privacy: Where on the one hand, the public wants to see more facts and asks for transparency of governmental data,
    the citizens want their data to be protected and ask for privacy.
    What is a secret and what is not? Who guarantees transparency or privacy? Is privacy by design the answer?
  2. Reproducibility and the Wealth of Data Gatherers: Its is physically impossible to pass big data to reviewers or colleagues. Hence, the old scheme of testing is no longer doable. Some methods can only be applied or tested on big data, but most of the researchers cannot access these. Is this a danger for science?
    Do we have to believe the owners of big data or can we get access in order to test our hypotheses? How can we achieve reproducibility of results from mining big data?
  3. Data Value -- polishing and mining: Preparing the data for interactive inspection is the real challenge -- if this enhancement has been done, a data mining step can easily be done but is no longer of the interest that it used to be. Are the enhanced data already the true result?
    Filtering and cleaning the data is a demanding step requiring high performance (real-time) algorithms, but we still need well-defined statistically well-based methods to analyze the prepared data sets. This is a prerequisite of good visualization for interactive inspection.
    Are these positions really contradictory? Or just a question of hen and egg? Is the well-known CRISP model now outdated? Or how would we like to update it?

Panel Moderator

Katharina Morik

Katharina Morik

Katharina Morik is a full professor of computer science at the TU Dortmund University. She was pioneering machine learning in Germany and co-chairing the European Conference on Machine Learning (and Principles of Knowledge Discovery in Data) in 1989 and 2008 (with Bart Goethals). As a member of the first steering committee, she got involved with the IEEE International Conference on Data Mining early on, chairing the program in 2004 (with Rajeev Rastogi). She is the speaker of the Collaborative Research Center "Providing Information by Resource-Constrained Data Analysis" (SFB876) which brings together data mining and embedded systems. She is in the editorial boards of Data Mining and Knowledge Discovery and Knowledge and Information Systems.

"The panel reflects the relations between different aspect of big data. We'll discuss how to benefit from data and circumvent dangers."

Panelists

Tina Eliassi-Rad

Tina Eliassi-Rad

Tina Eliassi-Rad is an Associate Professor of Computer Science at Rutgers University. Before joining academia, she was a Principal Investigator at Lawrence Livermore National Laboratory. Tina earned her Ph.D. in Computer Sciences at the University of Wisconsin-Madison. Within data mining and machine learning, Tina's research has been applied to the World-Wide Web, text corpora, large-scale scientific simulation data, complex networks, and cyber situational awareness. She has published over 50 peer-reviewed papers (including a best paper runner-up award at ICDM'09 and a best interdisciplinary paper award at CIKM'12). Tina is an action editor for the Data Mining and Knowledge Discovery Journal. In 2010, she received an Outstanding Mentor Award from the US DOE Office of Science and a Directorate Gold Award from Lawrence Livermore National Laboratory for work on cyber situational awareness.

"I think the value of data and the transparency and privacy are highly related. As a science, reproducibly is really important and some like Symantec Lab's WINE project are doing an excellent job here."

Fosca Giannotti

Fosca Giannotti

Fosca Giannotti is a director of research at the Information Science and Technology Institute of the National Research Council, ISTI-CNR, Pisa, Italy. Her current research interests include spatio-temporal data mining, privacy preserving data mining, social network analysis. She has been the coordinator of various European research projects, including GeoPKDD. She is a member of steering committee of the FP7 European Coordination Action MODAP: Mobility, Data mining and Privacy. She is the author of more than one hundred and fifty publications and served in the scientific committee of the main conferences in the area of Databases and Data Mining. She co-chaired the European Conf. on Machine Learning and Knowledge Discovery in Data 2004 and the IEEE Int. Conf. on Data Mining 2008. Currently Fosca Giannotti serves as chair of the Steering Committee of ECML/PKDD.

"Privacy-by design is a must, but there is a need of a big shift in a data ownership vision putting the user that produces the personal data at the center of the game and giving him transparency, right of remove and oblivion. This is a new deal on personal data."

Martin Krzywinski

Martin Krzywinski

Martin Krzywinski is a staff scientist at the Michael Smith Genome Sciences Center, where he creates visualization tools and information graphics that combine analytical clarity with an artistic dimension. He is the creator of Circos and hive plots. His graduate training is in physics (University of British Columbia). In 1999 he had the opportunity to build the Genome Sciences Center computing infrastructure. He contributed to the field of computer security by creating the port knocking method and introduced a method of optimizing keyboard layouts, which improved the Colemak layout as well as spawn the only fashion line named after a keyboard layout (TNWMLC).

"As software tools and databases fall out of their maintenance cycle, it becomes more and more difficult to repeat in silico experiments. The publication mechanism, in my opinion, does not stress enough the persistence of research tools."

Srinivasan Parthasarathy

Srinivasan Parthasarathy

Srinivasan Parthasarathy is a full Professor of Computer Science at Ohio State. His research interests are in Data Mining, High Performance Computing and Systems (broadly defined) and Graph and Network algorithms as they relate to social, biological and web applications. He is a recipient of both the NSF and DOE career awards, multiple research awards from Microsoft, Google and IBM, and multiple best paper awards and nominations from conferences such as IEEE ICDM, SIAM DM, VLDB, SIGKDD, ACM-Bioinformatics and ISMB. He has served on the organizational committees of numerous conferences and editorial boards of leading journals in data mining, databases and high performance computing and is currently serving a term as the co-chair of the steering committee of SIAM Data Mining.

"User interaction is a critical element of big data that is often ignored -- I'll have more to say on this (visualization, abstractions, grand-tour, pre-processing etc.)"

Christopher Ré

Christopher Ré

Christopher (Chris) Ré is an assistant professor in the department of Computer Sciences at the University of Wisconsin-Madison. The goal of his work is to enable users and developers to build applications that more deeply understand and exploit data. Chris received his PhD from the University of Washington, Seattle under the supervision of Dan Suciu. For his PhD work in the area of probabilistic data management, Chris received the SIGMOD 2010 Jim Gray Dissertation Award. Chris's papers have received four best papers or best-of-conference citations(best paper in PODS 2012 and best-of-conference in PODS 2010, twice, and one in ICDE 2009). Chris received an NSF CAREER Award in 2011.

"We are looking at a diverse set of applications: tracking neutrinos in the IceCube neutrino telescope, reading the geology literature to help geologists better understand the earth, analyzing English literature for themes to help English professors, and with several companies on enterprise applications."

Arno Siebes

Arno Siebes

Arno Siebes is a full Professor of Computer Science at the Utrecht University, Netherlands. His group "Algorithmic Data Analysis" investigates algorithmic questions in designing information systems that have to deal with large, and ever more quickly growing, amounts of data. Such data may be stored in many varieties, from neatly organized databases to unordered documents on the web. The fundamental principles of fitting search methods, data mining and knowledge discovery and the design of algorithmic technology for this form some of the major challenges in the field. He has published in and reviewed for the most acknowledged conferences in data mining.

"Research that cannot be duplicated is not research."