I’m using asp.net/C# and I’m looking to create unique(?) uris for a small CMS system I am creating.
I am generating the uri segment from my articles title, so for example if the title is “My amazing article” the uri would be http://www.website.com/news/my-amazing-article
There are two parts to this. Firstly, which characters do you think I need to strip out? I am replacing spaces with “-” and I think I should strip out the “/” character too. Can you think of any more that might cause problems? “?” perhaps? Should I remove all non-alpha characters?
Second question, above I mentioned the uris MAY need to be unique. I was going to check the uri list before adding to ensure uniqueness, however I see stack overflow uses a number plus a uri. This I assume allows titles to be duplicated? Do you think this would be a better way?
Transform all diacritics into their base character and then strip anything that is not a letter or a digit using
Char.IsLetterOrDigit.Then replace all spaces by a single dash.
This is what we use in our software.
Concerning using those generated name as unique Id, I would vouch against. Use the generated name as a SEO helper, but not as a key resolver. If you look at how stackoverflow references their pages:
You can find the ID there. These two URL point to the same page: