I have the following piece of Java code that reads strings from CSV file.

Question

0

Asked: June 6, 20262026-06-06T13:27:19+00:00 2026-06-06T13:27:19+00:00

I have the following piece of Java code that reads strings from CSV file.

0

I have the following piece of Java code that reads strings from CSV file. Then, split the strings in order to check them and extract the “anyaddress” part that comes in the pattern:”http://www.anyaddress.anything/“

//Split the file strings since it is CSV file
    while((Line=in.readLine())!=null) 
       strings = Line.split(",");

    for(int i=0; i<strings.length; i++)
    {
        Pattern regex = Pattern.compile(
        "(?<=http://www.)" + "[^/]*", Pattern.COMMENTS);
        Matcher regexMatcher = regex.matcher(strings[i]); 
        if (regexMatcher.find()) 
        { 
           //Returns the input subsequence matched by the previous match.
           ResultString = regexMatcher.group();                
           out.write(ResultString);
           out.newLine();
        }  //end if

    } //end for loop

    in.close();
    out.close();

Now, I found that my text file might contains strings in the following different formats:
‘http://www.anyaddress.anything/‘ OR ‘http://anyaddress.anything/‘ OR ‘https://www.anyaddress.anything/‘ OR ‘https://anyaddress.anything/‘

I need to extract the “anyaddress” part only.I have searched in previous posts can we check multiple patterns using regex in java? and found that I only have to add “|”. But for example, when I edited my regex to include the second pattern by adding:

Pattern regex = Pattern.compile(
        "(?<=http://www.) | (?<=http://)" + "[^/]*", Pattern.COMMENTS);

my program extracted the addresses as: http://www.anyaddress.anything, while I only need the “anyaddress.anything/” part only, in addition to that the program correctly extract the addresses that does not have the “www.” which was not able to extract previously.

Can anybody clarify to me where is my mistake? and give me example how can I include multiple patterns to make my program extract the links in any of the 4 mentioned format correctly ?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T13:27:21+00:00

I would avoid lookback because it is infrequently used and not necessary here. Also, I don’t know how it combines with alternation. Since you are parsing URLs I would recommend using the URL or URI class, extracting the domain name and then removing any leading ‘www’. If you still want to use regexes, try

Pattern.compile("https?//:(?:www[.])?([^/]*)")

That reads

http, plus an optional ‘s’ slash slash colon an optional ‘www.’ and a capture group of everything up to (but excluding) the next slash

And you read the result using group(1) because it is the first capture group, not the whole match.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have the following piece of Java code that reads strings from CSV file.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply