Vai al contenuto principale
Oggetto:
Oggetto:

PhD Toolbox and data analysis

Oggetto:

PhD Toolbox and data analysis

Oggetto:

Academic year 2019/2020

Teaching staff
Dott. Daniel Edward Chamberlain
Prof. Marco Gamba
Stefano Ghignone (Lecturer)
Year
1st year 2nd year 3rd year
Teaching period
Da definire
Type
Basic
Credits/Recognition
9
Course disciplinary sector (SSD)
BIO/05 - zoologia
BIO/07 - ecologia
BIO/09 - fisiologia
BIO/11 - biologia molecolare
Delivery
Formal authority
Language
English
Attendance
Obligatory
Type of examination
Practice test
Oggetto:

Sommario del corso

Oggetto:

Course objectives

The overall objectives of the PhD Toolbox are to enhance the expertise of doctoral students in order to contribute significantly to their current research, and to improve their competitiveness in the post-doctoral jobs market through the acquisition of transferable skills. The focus is on technical know-how, with a specific goal to acquire expertise in advanced programming and statistical techniques in order to carry out explorative analyses, including data management, the use of graphical tools for data presentation, and testing statistical hypotheses. In addition, the course will also consider ways to maximise the impact of students’ research, both from the perspective of publications (in particular, the role of bibliometrics) and communication (scientific writing, and the value and use of social media).

Oggetto:

Results of learning outcomes

Knowledge and Understanding

  • A thorough understanding of bibliometric indicators (Chamberlain module)
  • An understanding of what reproducible research means and the good practices to achieve it (Chiapello module)
  • What is R and why it is an important tool for each scientist (Chiapello module)
  • An understanding of the underlying principles of the use of linear models in statistical analyses (Chamberlain module)
  • An understanding of the underlying principles of the use of multivariate statistics and the most common machine learning algorithms (Gamba module)

 

Applying knowledge and understanding

  • Enhanced scientific writing skills in terms of journal papers (Chamberlain module)
  • The ability to use R to clean, transform and visualise biological data (Chiapello module)
  • The ability to analyse data using linear models in the R programming environment (Chamberlain module)
  • The bases of using R for principal component analysis, clustering, multilayer perceptrons and support vector machines (Gamba module)

 

Communication

  • The ability to use social media to communicate scientific research (Caprio module)
  • Improved clarity of presentation in scientific publications (Chamberlain module)
Oggetto:

Course delivery

There are two possible modes of delivery. Teaching in person may be provided in computer laboratories (for modules concerning analytical and programming techniques, which will consist of traditional lessons interspersed with practical demonstrations of the use of software tools. Scientific publishing, writing and social media modules will be presented in standard classrooms. Exercises will be set during lessons (some will be expected to be completed outside the set timetable of lessons), followed by class discussions.

In the event that teaching in person is not possible, online teaching will be provided though appropriate online platforms (such as Webex).

Oggetto:

Learning assessment methods

Assessments will be made both through exercises given during the lesson, and through set homework. No formal mark will be given, but attribution of credits will be based on delivery of set exercises.

Oggetto:

Program

The course will be divided into 6 modules for a total of 40 hours. A full timetable will be published in due course.

  1. Maximizing the Impact of your Research (Dan Chamberlain & Enrico Caprio) – 2 hours

This module is divided into two parts which will run in alternate years. The use of bibliometric indices (e.g. H-index, i-10, Impact Factors) to assess the ‘quality’ of both researchers and publications is now common in research. In the first part, we will consider how these indices are used in science, and how they relate to the publishing process, ultimately using this information to improve the impact of our published research. We will also discuss the pros and cons of the bibliometric approach.  In the second part, we will review the role of social media in communicating research, including the use of altmetrics (essentially, bibliometrics for social media). We will also look at techniques to maximise the reach of social media communications in the scientific community.

  1. Scientific Writing (Dan Chamberlain) - 4 hours

It is extremely important to communicate the results of scientific research in a clear, logical manner. In this module, we will consider how to structure a scientific paper, and then go through each part of the paper in detail, addressing techniques to enhance clarity and readability, and improve the visual impact of the paper through figures and tables. Although this module is not an English course, we will also look at some common mistakes that are made by non-native English speakers. The course will include an exercise of class discussion and set homework.

  1. Introduction to UNIX Environment and Command Line Basics (Stefano Ghignone) – 6 hours

UNIX is an operating system which was first developed in the 1960s, and has been under constant development ever since. By operating system, we mean the group of programs which make the computer work. UNIX is a stable, multi-user, multi-tasking system for servers, desktops and laptops, and it is popular in bioinformatics because of its powerful command-line tools that make scripting and performing automated analyses relatively easy. In this brief introductory course, we will focus on commands, those pesky little words you type on a command line prompt to tell the system what to do. The logic behind the command line concept is fundamental for understanding the functioning of the R statistical environment. We will learn the basics of the Unix environment and how to interact with it through the terminal, how to move around the Filesystem, how to view and edit Files, how to manipulate Files and Directories, and some basic Bioinformatic examples.

  1. R for Data Science (Marco Chiapello) – 12 hours

Computers are increasingly essential to the study of all aspects of biology. Data management skills are needed for entering data without errors, storing it in a usable way, and extracting key aspects of the data for analysis. Basic programming is required for everything from accessing and managing data, to statistical analysis, to modelling. This course will provide an introduction to data management, manipulation, and analysis, with an emphasis on biological problems. Classes will
typically consist of short introductions or question & answer sessions, followed by hands-on computing exercises. The course will be taught using R, but the concepts learned will easily apply to all programming languages and database management systems. No background in programming
or databases is required.

  1. Generalized Linear Models & Multi-Model Inference (Dan Chamberlain) – 8 hours

This module will address topics in General Linear Modelling in R. In the first part, we will consider standard parametric tests for normally distributed data in the context of linear modelling in R. We will develop models from univariate to multivariate analyses, and will include key topics such as interactions, assessing model fit and graphical representations of results. The second part will advance from the normal to other common data distributions (Poisson, binomial) and will introduce information theoretic approaches to model selection, including both multi-model inference and model averaging, which are especially relevant to models where several dependent variables are under consideration. Introductions to broader concepts in modelling, such as spatial and temporal autocorrelation, Generalized Additive Models, and model testing will also be included.

  1. Machine Learning with R (Marco Gamba) – 8 hours

This module provides Ph.D. students with the basics of Machine Learning using a hands-on lab and application-oriented approach. The first part of the course will look into how conventional statistical analysis relates to Machine Learning, and make a comparison of each. We will then focus on Unsupervised Learning, exploring the most common techniques, from Clustering and Cluster-Validation to Dimensional Reduction, discussing the advantages & disadvantages of each algorithm. We will then concentrate on Supervised Learning, using some of the most popular algorithms and introducing the concepts of Classification, Training and Testing Split, Neural Network, Support Vector Machine, Feature Extraction & Selection. We will also consider how to present the results of the analyses mentioned above by exploring different data visualization techniques. On completion of the Machine Learning module, students will be expected to have a good understanding of the fundamental issues and challenges of these topics: e.g. they should possess practical knowledge of Supervised and Unsupervised approaches, know strengths and weaknesses of the most popular techniques, and be able to implement various algorithms in a range of realistic research applications.

Suggested readings and bibliography

Oggetto:

Beckerman et al. (2017).  Getting started with R.  2nd Edn.  Oxford University press, Oxford.

Zuur et al. (2009).  A beginner’s Guide to R.  Springer, New York.

 

Peng (2012). R Programming for Data Science.

Wickham (2014). Advanced R. 

Grolemund (2014). Hands-On Programming with R.

Grolemund & Wickham (2016). R for Data Science.

 

The latter four books can be read for free here: https://bookdown.org/



Oggetto:

Class schedule

Content-type: text/html; charset=UTF-8 This module is not available - Ph.D. Program in Biology and Applied Biotechnologies - Università degli Studi di Torino Vai al contenuto principale

This module is not available

The Application 'lezioni' is not available.
For more information contact portale-supporto@unito.it
Oggetto:
Last update: 11/03/2021 16:48
Location: https://dott-sbba.campusnet.unito.it/robots.html
Non cliccare qui!