A Primer for Computational Biology
Shawn T. O'Neil
Published by Oregon State University Libraries and Press in Partnership with Open Oregon State
This book is available as an Open Access Textbook through Open Oregon State in collaboration with the Center for Genome Research and Biocomputing.
A Primer for Computational Biology
aims to provide life scientists and students the skills necessary for
research in a data-rich world. The text covers accessing and using
remote servers via the command-line, writing programs and pipelines for
data analysis, and provides useful vocabulary for interdisciplinary
work. The book is broken into three parts:
- Introduction to Unix/Linux:
The command-line is the “natural environment” of scientific computing,
and this part covers a wide range of topics, including logging in,
working with files and directories, installing programs and writing
scripts, and the powerful “pipe” operator for file and data
manipulation. - Programming in Python: Python is both a
premier language for learning and a common choice in scientific software
development. This part covers the basic concepts in programming (data
types, if-statements and loops, functions) via examples of DNA-sequence
analysis. This part also covers more complex subjects in software
development such as objects and classes, modules, and APIs. - Programming in R:
The R language specializes in statistical data analysis, and is also
quite useful for visualizing large datasets. This third part covers the
basics of R as a programming language (data types, if-statements,
functions, loops and when to use them) as well as techniques for
large-scale, multi-test analyses. Other topics include S3 classes and
data visualization with ggplot2.
About the author
Shawn T. O’Neil earned a BS in computer science from Northern Michigan University, and later an MS and PhD in the same subject from the University of Notre Dame. His past and current research focuses on bioinformatics. O’Neil has developed and taught several courses in computational biology at both Notre Dame and Oregon State University.
Read more about this author
Table of Contents
Preface
Acknowledgements
Dedication
Part I:
Introduction to Unix/Linux
Context
Logging In
The Command
Line and Filesystem
Working with
Files and Directories
Permissions
and Executables
Installing (Bioinformatics)
Software
Command Line
BLAST
The Standard
Streams
Sorting, First
and Last Lines
Rows and
Columns
Patterns
(Regular Expressions)
Miscellanea
Part II:
Programming in Python
Hello, World
Elementary
Data Types
Collections
and Looping: Lists and for
File Input and
Output
Conditional
Control Flow
Python
Functions
Command Line
Interfacing
Dictionaries
Bioinformatics
Knick-knacks and Regular Expressions
Variables and
Scope
Objects and
Classes
Application
Programming Interfaces, Modules, Packages, Syntactic Sugar
Algorithms and
Data Structures
Part III:
Programming in R
An Introduction
Variables and
Data
Vectors
R Functions
Lists and
Attributes
Data Frames
Character and
Categorical Data
Split, Apply,
Combine
Reshaping and
Joining Data Frames
Procedural
Programming
Objects and
Classes in R
Plotting Data
and ggplot2
Files
Index
About the
Author