Wordclouds from ORCiD

I keep seeing nice little wordclouds summarizing research profiles from different groups. I’ve played with them myself. Drawing is simple, but getting the data can be a pain. Now, with the new rorcid package that is made easy. If you don’t already have them, you need the following installed:

## install packages using the following
install.packages("devtools") # to install from github
library(devtools)
install_github("ropensci/rorcid") # to access orcid through R
install.packages(c("tm","wordcloud")) # to draw word clouds

I’ve put a few convenience functions in a gist, so do

Continue reading

Reading .xlsx files into R, quickly

The gdata library is fantastic for reading small .xls or .xlsx files into R, but it can be very slow for large files. Today I had to read data from the second sheet of a 64Mb .xlsx file. It should have gone something like

library(gdata)
my.data <- read.xls("myfile.xlsx", sheet=2, header=TRUE)

but R just hung there, unresponsive, for minutes.

Continue reading

Coloc v2.3

Coloc v2.3 is up on CRAN.

2013-09-25  Chris Wallace  <chris.wallace@cimr.cam.ac.uk>

        * v2.3
        BUGFIX: Introduced a function to estimate trait variance
        from supplied coefficients and standard errors.  This is used
        within the approach implemented in coloc.abf(), and replaces the
        earlier version which implicity assumed that var(Y)=1 for
        quantitative traits, which could lead to incorrect inference when
        var(Y) was far from 1.

Pretty printing progress bars from .Call in R

I wanted to do something that takes a while in C (called from R) and wanted to print a progress bar.

In the R function, I did

pBar <- txtProgressBar( min = 0, max = n, style = 3 )
ret <- .Call("myfunction", arg1, arg2, as.integer(type), pBar, PACKAGE="annotSnpStats")

In the C function, I had

SEXP myfunction(SEXP arg1, SEXP arg2, SEXP Rtype, SEXP pBar) {

  int nprotect=0;
  SEXP utilsPackage, percentComplete;
  PROTECT(utilsPackage = eval(lang2(install("getNamespace"), ScalarString(mkChar("utils"))), R_GlobalEnv));
  PROTECT(percentComplete = allocVector(INTSXP, 1));
  nprotect+=2;
  int *rPercentComplete = INTEGER(percentComplete);
 
  ...

  for(i=0; i<nx; i++) { // index rows of x
  *rPercentComplete = i; //this value increments
  eval(lang4(install("setTxtProgressBar"), pBar, percentComplete, R_NilValue), utilsPackage);
  }

  ... 

  UNPROTECT(nprotect);
  return(myreturn);

}

And … it … just … worked! Happy day.

To see the full R function, go to dups in snp-match.R and the C is in countdiffs in comparisons.c.

Coloc v2.2

Coloc v2.2 is up on CRAN. The last two changes are shown below. The important thing is that the arguments to coloc.abf() have changed. Please do revise your code!

I do try and avoid completely changing arguments to released functions, but in this case, the function was introduced relatively recently, and the change makes sense because it allows us to analyse either datasets for which coefficients and standard errors are available, or for which only p values and minor allele frequencies are available, in a single function.

2013-19-06  Chris Wallace  <chris.wallace@cimr.cam.ac.uk>

        * v2.2
        Merged coloc.abf and coloc.abf.imputed(), so that datasets for
        wheich beta, var(beta) are available can be matched to datasets
        with only p values and maf.2

        This means the arguments to coloc.abf() have been changed!  Please
        check ?coloc.abf for the new function.
2013-03-06  Chris Wallace  <chris.wallace@cimr.cam.ac.uk>

        * v2.1
        Bug fix for coloc.abf() function, which used p12 instead of
        log(p12) to calculate L4.

        New function coloc.abf.imputed() to make better use of fuller
        information on imputed data.