In reddit URL, there is 5 characternumerics thing_id part (for example, wplf7 from http://redd.it/wplf7)

Question

0

Asked: June 8, 20262026-06-08T02:12:45+00:00 2026-06-08T02:12:45+00:00

In reddit URL, there is 5 characternumerics thing_id part (for example, wplf7 from http://redd.it/wplf7)

0

In reddit URL, there is “5 characternumerics” thing_id part (for example, “wplf7” from “http://redd.it/wplf7”) which is generated by base36.

wplf7 is generated from number 54941875 – this is what I found so far… I’m wondering how 54941875 is generated.

I’m trying to scrape comment of a reddit’s specific section (let’s say http://www.reddit.com/r/leagueoflegends/) using R and I’m stuck at this 5 character numerics.

Anyone who can explain this in the simple manner? Unfortunately Python is not my domain and 2000 lines of python code listed on Reddit’s website didn’t help me much.

Thanks,

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T02:12:47+00:00

Firstly set an uniqueish user agent as reddit likes this

options(HTTPUserAgent="My name is BOB")

I assumme you want to get the content at http://www.reddit.com/r/leagueoflegends/ . You need to append a .json to the url:

library(RJSONIO)
library(RCurl)
# library(XML)

jdata<-getURL('http://www.reddit.com/r/leagueoflegends/.json')
jdata<-fromJSON(jdata)
# xdata<-getURL('http://www.reddit.com/r/leagueoflegends/.xml')
# xdata<-xmlParse(xdata)

Obviously the content is very rich for example the domains,permalinks,authors, titles of posts:

Domains<-sapply(jdata[[2]]$children,function(x){x$data$domain})
permalinks<-sapply(jdata[[2]]$children,function(x){x$data$permalink})
authors<-sapply(jdata[[2]]$children,function(x){x$data$author})
titles<-sapply(jdata[[2]]$children,function(x){x$data$title})
ids<-sapply(jdata[[2]]$children,function(x){x$data$id})
created<-as.POSIXct(sapply(jdata[[2]]$children,function(x){x$data$created}),origin="1970/01/01")


> head(titles)
[1] "Pendragon 3-day-banning someone for randoming in ranked, or saying hes going to. Mixed feelings..."
[2] "Dig Kicks L0cust."                                                                                 
[3] "Summoners, y u no communicate??"                                                                   
[4] "Without Even Trying"                                                                               
[5] "Cross Country Tryndamere (Chaox Stream)"                                                           
[6] "Top 5 Flops - Episode 4 ft Dyrus, Phantoml0rd, and HatPerson vs Baron Nashor"                      
>

To investigate how these ids are generated we can apply @Ben Bolker s base36ToInteger function to the ids we have gathered and compare them against the date they were created giving:

createData<-data.frame(created=created,ids=sapply(ids,base36ToInteger))
> dput(createData)
structure(list(created = structure(c(1342658844, 1342657298, 
1342622962, 1342643655, 1342641187, 1342654768, 1342665353, 1342640599, 
1342648272, 1342662822, 1342654185, 1342659591, 1342624350, 1342647907, 
1342637587, 1342591960, 1342625515, 1342642330, 1342651384, 1342668363, 
1342608976, 1342608165, 1342632545, 1342638611, 1342643489), class = c("POSIXct", 
"POSIXt")), ids = c(55047001, 55044612, 55010018, 55025557, 55022809, 
55040754, 55056689, 55022221, 55031424, 55053023, 55039810, 55048123, 
55010880, 55030934, 55019343, 54976515, 55011555, 55024060, 55035670, 
55061120, 54998192, 54997264, 55015528, 55020295, 55025363)), .Names = c("created", 
"ids"), row.names = c("wrujd", "wrsp0", "wr202", "wrdzp", "wrbvd", 
"wrppu", "ws20h", "wrbf1", "wriio", "wrz6n", "wrozm", "wrvej", 
"wr2o0", "wri52", "wr973", "wqc5f", "wr36r", "wrcu4", "wrlsm", 
"ws5fk", "wqsvk", "wqs5s", "wr694", "wr9xj", "wrdub"), class = "data.frame")

enter image description here

which implies that reddit generates these numbers sequentially across the site as new posts are created.

Without a specific direction I will leave it at this but hopefully you get the idea.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In reddit URL, there is 5 characternumerics thing_id part (for example, wplf7 from http://redd.it/wplf7)

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply