Classwork for BIMM143
Blinda Sui (PID: A17117043)
All functions in R have at least 3 things:
Our first wee function:
add <- function(x, y=1) {
x + y
}
Let’s test our function
add(c(1, 2, 3), y=10)
[1] 11 12 13
add(10)
[1] 11
add(10, c(10, 120))
[1] 20 130
Let’s try something more interesting. Make a sequence generation tool.
The ‘sample()’ function could be useful here.
sample(1:10, size = 3)
[1] 8 5 4
change this to work with the nucleotides A C G and T and return 3 or them.
n <- c("A", "C", "G", "T")
sample(n, size = 15, replace = TRUE)
[1] "C" "G" "A" "A" "G" "C" "G" "A" "G" "T" "T" "G" "G" "A" "G"
Turn this snipet into a function that returns a user specified length dna sequence. Let’s call it ‘generate_dna()’
generate_dna <- function(len=10, fasta=FALSE){
n <- c("A", "G", "C", "T")
v <- sample(n, size=len, replace = TRUE)
# Make a single element vector
s <- paste(v, collapse = "")
cat("Well done you!\n")
if(fasta) {
return(s)
} else {
return(v)
}
}
generate_dna(5)
Well done you!
[1] "T" "A" "C" "A" "C"
s <- generate_dna(15)
Well done you!
s
[1] "G" "A" "T" "G" "T" "T" "C" "T" "T" "T" "T" "G" "G" "C" "C"
I want the option to return a single element character vector with my sequence all together like this: “GGAGTAC”
s
[1] "G" "A" "T" "G" "T" "T" "C" "T" "T" "T" "T" "G" "G" "C" "C"
paste(s, collapse = "")
[1] "GATGTTCTTTTGGCC"
generate_dna(10, fasta=TRUE)
Well done you!
[1] "GCACACAGTT"
generate_dna(10, fasta=FALSE)
Well done you!
[1] "C" "A" "T" "G" "T" "A" "C" "C" "G" "G"
Make a third function that generates protein sequence of a user specified length and format.
generate_protein <- function(size = 15, fasta = TRUE) {
aa <- c("A","R","N","D","C","E","Q","G","H","I","L",
"K","M","F","P","S","T","W","Y","V")
seq <- sample(aa, size = size, replace = TRUE)
if(fasta) {
return(paste(seq, collapse = ""))
} else {
return(seq)
}
}
Try this out…
generate_protein(10)
[1] "LEKLPYSNAN"
Q. Generate random protein sequences between lengths 5 and 12 amino-acids.
generate_protein(5)
[1] "MMKHM"
generate_protein(6)
[1] "QFRHCS"
One approach is to do this by brute force calling our unction for each lenght 5 to 12.
Another approach is to write a ‘for()’ loop to itterate over the input valued 5 to 12.
A very useful third R specific approach is to use the ‘sapply()’ function.
seq_lengths <- 6:12
for(i in seq_lengths) {
cat(">", i, "\n")
cat(generate_protein(i))
cat("\n")
}
> 6
VTQWIQ
> 7
ERKQLNP
> 8
YCHTHPAH
> 9
MNDNCCYII
> 10
FFKPKKLIDT
> 11
PAHATCDQDRG
> 12
DTEMWSEHKFCH
sapply(6:12, generate_protein)
[1] "NHCCIW" "NLNAPLK" "YAQSFLLN" "NIHMMARLK" "TAYHAKGLQN"
[6] "VPDVAMRKSPD" "PIMECIRSGRSP"
Key-Point: Writing functions in R is doable but not the easiest thing. Starting with a working snippet of code and then using LLM tools to improve and generalize your function code is a productive approach.