Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 375765
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T14:32:02+00:00 2026-05-12T14:32:02+00:00

I’m trying to create a method that provides best effort parsing of decimal inputs

  • 0

I’m trying to create a method that provides “best effort” parsing of decimal inputs in cases where I do not know which of these two mutually exclusive ways of writing numbers the end-user is using:

  • “.” as thousands separator and “,” as decimal separator
  • “,” as thousands separator and “.” as decimal separator

The method is implemented as parse_decimal(..) in the code below. Furthermore, I’ve defined 20 test cases that show how the heuristics of the method should work.

While the code below passes the tests it is quite horrible and unreadable. I’m sure there is a more compact and readable way to implement the method. Possibly including smarter use of regexpes.

My question is simply: Given the code below and the test-cases, how would you improve parse_decimal(…) to make it more compact and readable while still passing the tests?

Clarifications:

  • Clarification #1: As pointed out in the comments the case ^\d{1,3}[\.,]\d{3}$ is ambiguous in that one cannot determine logically which character is used as thousands separator and which is used as a decimal separator. In ambiguous cases we’ll simply assume that US-style decimals are used: “,” as thousands separator and “.” as decimal separator.
  • Clarification #2: If you believe that any of test cases is wrong, then please state which of the tests that should be changed and how.

The code in question including the test cases:

#!/usr/bin/perl -wT

use strict;
use warnings;
use Test::More tests => 20;

ok(&parse_decimal("1,234,567") == 1234567);
ok(&parse_decimal("1,234567") == 1.234567);
ok(&parse_decimal("1.234.567") == 1234567);
ok(&parse_decimal("1.234567") == 1.234567);
ok(&parse_decimal("12,345") == 12345);
ok(&parse_decimal("12,345,678") == 12345678);
ok(&parse_decimal("12,345.67") == 12345.67);
ok(&parse_decimal("12,34567") == 12.34567);
ok(&parse_decimal("12.34") == 12.34);
ok(&parse_decimal("12.345") == 12345);
ok(&parse_decimal("12.345,67") == 12345.67);
ok(&parse_decimal("12.345.678") == 12345678);
ok(&parse_decimal("12.34567") == 12.34567);
ok(&parse_decimal("123,4567") == 123.4567);
ok(&parse_decimal("123.4567") == 123.4567);
ok(&parse_decimal("1234,567") == 1234.567);
ok(&parse_decimal("1234.567") == 1234.567);
ok(&parse_decimal("12345") == 12345);
ok(&parse_decimal("12345,67") == 12345.67);
ok(&parse_decimal("1234567") == 1234567);

sub parse_decimal($) {
    my $input = shift;
    $input =~ s/[^\d,\.]//g;
    if ($input !~ /[,\.]/) {
        return &parse_with_separators($input, '.', ',');
    } elsif ($input =~ /\d,\d+\.\d/) {
        return &parse_with_separators($input, '.', ',');
    } elsif ($input =~ /\d\.\d+,\d/) {
        return &parse_with_separators($input, ',', '.');
    } elsif ($input =~ /\d\.\d+\.\d/) {
        return &parse_with_separators($input, ',', '.');
    } elsif ($input =~ /\d,\d+,\d/) {
        return &parse_with_separators($input, '.', ',');
    } elsif ($input =~ /\d{4},\d/) {
        return &parse_with_separators($input, ',', '.');
    } elsif ($input =~ /\d{4}\.\d/) {
        return &parse_with_separators($input, '.', ',');
    } elsif ($input =~ /\d,\d{3}$/) {
        return &parse_with_separators($input, '.', ',');
    } elsif ($input =~ /\d\.\d{3}$/) {
        return &parse_with_separators($input, ',', '.');
    } elsif ($input =~ /\d,\d/) {
        return &parse_with_separators($input, ',', '.');
    } elsif ($input =~ /\d\.\d/) {
        return &parse_with_separators($input, '.', ',');
    } else {
        return &parse_with_separators($input, '.', ',');
    }
}

sub parse_with_separators($$$) {
    my $input = shift;
    my $decimal_separator = shift;
    my $thousand_separator = shift;
    my $output = $input;
    $output =~ s/\Q${thousand_separator}\E//g;
    $output =~ s/\Q${decimal_separator}\E/./g;
    return $output;
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T14:32:02+00:00Added an answer on May 12, 2026 at 2:32 pm

    The idea in these problems is to look at the code and figure out where you typed anything twice. When you see that, work to remove it. My program handles everything in your test data, and I don’t have to repeat program logic structures to do it. That lets me focus on the data rather than program flow.

    First, let’s clean up your tests. You really have a set of pairs that you want to test, so let’s put them into a data structure. You can add or remove items from the data structure as you like, and the tests will automatically adjust:

    use Test::More 'no_plan';
    
    my @pairs = (
         #  got          expect
        [ "1,234,567",  1234567  ],
        [ "1,234567",   1.234567 ],
        [ "1.234.567",  1234567  ],
        [ "1.234567",   1.234567 ],
        [ "12,345",     12345    ],
        [ "12,345,678", 12345678 ],
        [ "12,345.67",  12345.67 ],
        [ "12,34567",   12.34567 ],
        [ "12.34",      12.34    ],
        [ "12.345",     12345    ],  # odd case!
        [ "12.345,67",  12345.67 ],
        [ "12.345.678", 12345678 ],
        [ "12.34567",   12.34567 ],
        [ "123,4567",   123.4567 ],
        [ "123.4567",   123.4567 ],
        [ "1234,567",   1234.567 ],
        [ "1234.567",   1234.567 ],
        [ "12345",      12345    ],
        [ "12345,67",   12345.67 ],
        [ "1234567",    1234567  ],
    );
    

    Now that you have it in a data structure, your long line of tests reduces to a short foreach loop:

    foreach my $pair ( @pairs ) {
         my( $original, $expected ) = @$pair;
         my $got = parse_number( $original );
         is( $got, $expected, "$original translates to $expected" );
         }
    

    The parse_number routine likewise condenses into this simple code. Your trick is to find out what you are doing over and over again in the source and not do that. Instead of trying to figure out weird calling conventions and long chains of conditionals, I normalize the data. I figure out which cases are odd, then turn them into not-odd cases. In this code, I condense all of the knowledge about the separators into a handful of regexes and return one of two possible lists to show me what the thousands separator and decimal separator are. Once I have that, I remove the thousands separator completely and make the decimal separator the full stop. As I find more cases, I merely add a regex that returns true for that case:

    sub parse_number
        {
        my $string = shift;
    
        my( $separator, $decimal ) = do {
            local $_ = $string;
            if( 
                /\.\d\d\d\./           || # two dots
                /\.\d\d\d,/            || # dot before comma
                /,\d{4,}/              || # comma with many following digits
                /\d{4,},/              || # comma with many leading digits
                /^\d{1,3}\.\d\d\d\z/   || # odd case of 123.456
                0
                )
                { qw( . , ) }
            else { qw( , . ) }      
            };
    
        $string =~ s/\Q$separator//g;
        $string =~ s/\Q$decimal/./;
    
        $string;
        }
    

    This is the sort of thing I talk about in the dynamic subroutines chapter of Mastering Perl. Although I won’t go into it here, I would probably turn that series of regexes into a pipeline of some sort and use a grep.

    This is just the part of the program that passes your tests. I’d add another step to verify that the number is an expected format to deal with dirty data, but that’s not so hard and is just a simple matter of programming.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
I'm trying to create an if statement in PHP that prevents a single post
Basically, what I'm trying to create is a page of div tags, each has
I am trying to understand how to use SyndicationItem to display feed which is
I need a function that will clean a strings' special characters. I do NOT
I'm new to using the Perl treebuilder module for HTML parsing and can't figure
link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I used javascript for loading a picture on my website depending on which small
I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.