I have an unfinished binary file that has some info that I can recover

Question

0

Asked: May 12, 20262026-05-12T18:39:55+00:00 2026-05-12T18:39:55+00:00

I have an unfinished binary file that has some info that I can recover

0

I have an unfinished binary file that has some info that I can recover using regex. The contents are:

G $12.Angry.Men.1957.720p.HDTV.x264-HDLH Lhttp://site.com/forum/f89/12-angry-men-1957-720p-hdtv-x264-hdl-538403/ L I Š M ,ABBA.The.Movie.1977.720p.BluRay.DTS.x264-iONN Phttp://site.com/forum/f89/abba-movie-1977-720p-bluray-dts-x264-ion-428687/&

How can I parse it so I can at least get links that are:

http://site.com/forum/f89/abba-movie-1977-720p-bluray-dts-x264-ion-428687/

where 428687 is the id number.

So I would have a full link and an id.

The other names that comes before are the name of the links:

ABBA.The.Movie.1977.720p.BluRay.DTS.x264-iON

Though I am not sure if these can be parsed. I noticed they all have a character before and after the LINKS and the NAMES. So maybe this can narrow down the problem?

Btw I am willing to give 500 bounty for the correct answer.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T18:39:56+00:00

Something like the following regular expression?

MatchCollection matches = Regex.Matches(yourString, @"http://\S+?-(\d+)/") 
foreach(Match m in matches)
{
    string id = m.Captures[0].Value;
    string url = m.Value;
}

which will grab links (starting http://) then everything not a space (spaces are guaranteed not around in HTTP (URI) links) and assumes it ends with digits and a trailing slash (this will correctly remove the & in your example or other trailing text).

EDIT: the whole match is the link, the ID is in the first capturing parentheses, updated code to show how to get the info.

Update: if dash+digits+slash can occur more then once in the URL, then greediness must be used, but then consecutive links (with no additional text having spaces) will be matched together. If dash+digits+slash occurs only once per URL, then laziness is preferred. This is the solution currently in the code above.

Alternative approach

From the updates and the extra information, I understand that there’s a lot unclear about the text. Another approach might be easier: split everything on http:// and go through the results. This prevents having to make a complex look-forward/backward regex and makes sure that consecutive links (i.e., without text in-between) are correctly treated:

// zero-width split:
string[] linksWithText = Regex.Split(yourString, @"(?<=http:\S+-\d+/)");
foreach (string link in linksWithText)
{
    Match m = Regex.Match(link, @"(.*)(http:\S+-(\d+)/)$");
    if (m.Success)
    {
        string text = m.Groups[1].Value;
        string url = m.Groups[2].Value;
        string id = m.Groups[3].Value;
    }
}

Update: alternative approach updated. The text (name) is first, then url. Note the negative look behind expression to split on a zero-width spot, taking anything before the url up to the end of the url.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have an unfinished binary file that has some info that I can recover

Leave an answerCancel reply

1 Answer

Alternative approach

Leave an answer
Cancel reply