May 232013
 

It’s an episode of firsts on the R-Podcast! In this episode recorded on location I had the honor and privilege of interviewing Yihui Xie, author of many innovative packages such as knitr and animation. Some of the topics we discussed include:

  • Yihui’s motivation for creating knitr and some key new features
  • How markdown plays a key role in making reproducible research more accessible
  • An innovative approach for publishing and maintaining reproducible statistical results online

And much more on this “lucky” episode 13 of the R-Podcast!

Episode 13 Show Notes

Resources mentioned during interview with Yihui

R Community Roundup

Package pick

  • Pandoc: Powerful and customizable document conversion

How to interact with the show

  • Submit your questions and comments via the R-Podcast contact page, or send an email to theRcast(at)gmail.com
  • Send in an audio comment via audio attachment to theRcast(at)gmail.com, or leave a voicemail on the R-Podcast voicemail hotline: +1-269-849-9780
  • Get show updates via our Twitter account: @theRcast
  • Follow us on our R-Podcast Google Plus page: gplus.to/thercast
  • Provide your favorite R community links at the R-Podcast subreddit: links.r-podcast.org/

Music Credits

Apr 042013
 

Title

This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the MD toolbar button for help on Markdown).

When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist    
##  Min.   : 4.0   Min.   :  2  
##  1st Qu.:12.0   1st Qu.: 26  
##  Median :15.0   Median : 36  
##  Mean   :15.4   Mean   : 43  
##  3rd Qu.:19.0   3rd Qu.: 56  
##  Max.   :25.0   Max.   :120

You can also embed plots, for example:

plot(cars)

plot of chunk unnamed-chunk-2

Apr 012013
 

This is not an April Fool’s joke … The R-Podcast is back once again! In this episode, I discuss the concept of version control and how you can get started with using the Git VCS right now with your R projects. Also I discuss a big batch of listener feedback, and highlight a couple of great visualization applications from the community using ggplot2. All of that and more on episode 12 of the R-Podcast!

Episode 12 Show Notes

The basics for version control and Git

Listener Feedback

R Community Roundup

Package pick

  • reports: An R package to assist in the workflow of writing academic articles and other reports (via TRinker’s blog)

How to interact with the show

  • Submit your questions and comments via the R-Podcast contact page, or send an email to theRcast(at)gmail.com
  • Send in an audio comment via audio attachment to theRcast(at)gmail.com, or leave a voicemail on the R-Podcast voicemail hotline: +1-269-849-9780
  • Get show updates via our Twitter account: @theRcast
  • Follow us on our R-Podcast Google Plus page: gplus.to/thercast
  • Provide your favorite R community links at the R-Podcast subreddit: links.r-podcast.org/

Music Credits

Nov 132012
 

Season 2 of the R-Podcast is up and running! This episode begins a multi-part series on reproducible analysis using R. In this episode I discuss the usage of Sweave and LaTeX for producing reproducible reports, an introduction to the capabilities of the knitr package (more episodes will be coming dedicated to this package), and my motivation for adapting reproducible analysis techniques and tools into my workflow. In our listener feedback segment I discuss a new means of providing feedback to the R-Podcast using our new sub-reddit page and introduce new segments highlighting interesting stories around the R community and useful packages. This promises to be an exciting season of the R-Podcast, and I hope you enjoy this episode!

The following resources are mentioned in this episode:

Episode 11 Time Stamps

00:00 The R-Podcast #011 Reproducible Analysis Part 1
00:40 Introduction
02:43 Reproducible Research: Introduction
08:18 Sweave overview
16:20 Knitr overview
20:20 The Duke University Research Saga
30:56 What version control can offer
38:34 Presenting results
42:18 Listener feedback
60:55 R community roundup
69:39 Package pick: plyr
72:04 Wrapping up: subscribe at www.r-podcast.org, theRcast@gmail.com, + 1-269-849-9780, Twitter @theRcast, Google Plus, links.r-podcast.org
77:21 End
Sep 162012
 

I’m happy to present episode 10 of the R-Podcast! Season 1 of the R-Podcast concludes with part 2 of my series on data munging, in which I discuss issues surrounding importing data sets contained in HTML tables. I share how I used the XML and RCurl packages to validate and import data from hockey-reference.com for storage into a MySQL database. Our listener feedback segment contains another installment on the Pitfalls of R contributed by listener Frans. I want to thank everyone who has provided such positive feedback throughout the season, and I’m looking forward to providing some exciting new content for season 2. I hope you enjoy the episode and check out our new contact page if you would like to provide any feedback. Thanks for listening!

The following resources are mentioned in this episode:

Episode 10 Time Stamps

00:00 The R-Podcast #010 Adventures in Data Munging Part 2
00:33 Introduction
01:50 Wrapping up season 1 ... wait, what?
03:30 Rstudio team expands
05:41 R Community milestone
07:53 Discovering hockey-reference.com 
10:54 Tips for readHTMLtable
21:10 Checking for valid data first
29:23 Minor processing needed
35:18 Saving data to MySQL database
45:26 Listener Feedback: Andrew
54:58 Frans: Pitfalls of R segment 2
63:40 Wrapping up: subscribe to the podcast, theRcast@gmail.com, + 1-269-849-9780, Twitter @theRcast
69:14 End
Aug 052012
 

It’s great to be back with a new episode after an eventful break! This episode begins a series on my adventures in data munging, a.k.a data processing. I discuss three issues that demonstrate the flexibility and versatility R brings for recoding messy values, important inconsistent data files, and pinpointing problematic observations and variables. We also have an extended listener feedback segment with an audio installment of the “pitfalls” of R contributed by listener Frans. I hope you enjoy this episode and keep passing along your feedback to theRcast(at)gmail.com and stop by the forums as well!

The following resources are mentioned in this episode:

Episode 9 Time Stamps

00:00 The R-Podcast #009: Adventures in Data Munging Part 1
00:31 Introduction
01:38 Big news: +1
03:53 R 2.15.1 released
04:26 UseR! 2012
07:20 Hockey Summary Project
10:30 Dealing with empty files
15:18 Importing inconsistent data files
28:15 Recoding using car package
35:08 Useful functions for pinpointing issues
44:55 Listener Feedback
45:14 Daniel: Advice on data munging
55:01 Frans: Pitfalls of R
66:28 Wrapping up: subscribe to the podcast, theRcast@gmail.com, + 1-269-849-9780, Twitter @theRcast, Google Plus
71:22 End
Jun 232012
 

Here is the second screencast episode of the R-Podcast to accompany episode 8 of the R-Podcast: Visualization with ggplot2. In this screencast I demonstrate a real-time session of using ggplot2 to create boxplots for a visualization of hockey attendance in the NHL. The R code created in this screencast is available in our GitHub repository, and also each of the online resources are linked below. I added some new tweaks to the recording of this screencast based on feedback from the first screencast episode. Please let me know what you think of this improved screencast! As always you can send your feedback via email or audio comment to theRcast(at)gmail.com, leave a voicemail on our voicemail hotline at +1-269-849-9780, or join our new forums and leave a comment for this episode!

The following resources are mentioned in this episode:

Jun 202012
 

I’m happy to present this jam-packed episode of the R-Podcast dedicated to using the ggplot2 package for visualization. This episode will have a companion screencast released in the next few days. I use data from the Hockey Summary Project to demonstrate how to create a series of boxplots of NHL regular season attendance for each team. The R code used in this episode will be available via GitHub. I also extend my thanks to the Going Linux podcast for plugging the R-Podcast. If you are interested in providing a listener tip about R, please call the voicemail hotline at +1-269-849-9780 or record an audio clip and send it to theRcast(at)gmail.com. Please keep the wonderful feedback coming and hope you enjoy this episode!

The following resources are mentioned in this episode:

Episode 8 Time Stamps

00:00 The R-Podcast #008 Visualization with ggplot2
00:34 Introduction
01:45 Thank you Going Linux
05:01 Listener feedback
14:14 ggplot2 background and philosophy
23:00 Description of data 
30:20 Setting up our plot with ggplot function
38:15 Adding boxplot layer
44:31 Customizing appearance
60:35 Facet by era
67:02 Making code reproducible
73:03 Helpful ggplot2 resources
85:30 Wrapping up: subscribe to the podcast, theRcast@gmail.com, + 1-269-849-9780, Twitter @theRcast
89:29 End
May 282012
 

Hello everybody, I am finally back with a new episode! In this episode: Hardware issues, major update to RStudio, new forums, and discussion on managing your workflow for projects. I discuss useful functions for executing R scripts and saving/loading R objects for future sessions, and summarize different solutions for organizing R code based on task and via the ProjectTemplate package, along with the importance of version control. Please check out the new forums and let me know what you think! If you are interested in providing a listener tip about R, please call the voicemail hotline at +1-269-849-9780 or record a short mp3 or ogg audio clip and send it to theRcast(at)gmail.com . As always I welcome any other feedback you have. Thanks for listening!

P.S. From our Google Plus page, Darren pointed out that I switched forward slashes with backward slashes in my discussion about file paths in Episode 6. Thanks Darren!

The following resources are mentioned in this episode:

Episode 7 Time Stamps

00:00 The R-Podcast #007 Best Practices for Workflow Management
00:31 Introduction
01:07 No more TV recording for now
03:40 New forums!
08:25 RStudio update v0.96
12:50 Listener feedback
19:35 Using source(), save(), save.image(), and load()
25:00 load.R, clean.R, func.R, do.R
29:50 ProjectTemplate
40:06 Version Control with Git, RStudio
46:30 Wrapping up: subscribe to the podcast, theRcast@gmail.com, + 1-269-849-9780, Twitter @theRcast
52:44 End
Apr 292012
 

In this episode: Listener feedback and importing data from external sources into R. We dive into the basics of importing delimited text files using read.table and its varients. We also discuss recommendations for importing MS Excel spreadsheet files, relational databases such as MySQL, data from HTML tables, and files produced by other statistical computing packages. If you are interested in providing a listner tip about R in audio format, please call the voicemail hotline at +1-269-849-9780 or record a short mp3 or ogg audio clip and send it to theRcast(at)gmail.com . Hope you enjoy the episode!

The following resources are mentioned in this episode:

Episode 6 Time Stamps

00:00 The R-Podcast #006 Importing Data from External Sources
00:34 Introduction
01:46 Listener Feedback
07:45 Description of delimited text files
09:18 Using read.table and key arguments
18:17 R Data Import-Export Manual
19:10 Importing spreadsheet data considerations
21:10 XLConnect package advantages
25:20 Importing HTML tables using XML package
33:55 Using RMySQL with MySQL databases
43:52 Data from other statistical software
44:18 The foreign package
45:45 sas7bdat package
49:13 Wrapping up: subscribe to the podcast, theRcast@gmail.com, + 1-269-849-9780
53:54 End