Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 585833
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T15:02:38+00:00 2026-05-13T15:02:38+00:00

I am looping through a big data file and like to detect the type

  • 0

I am looping through a big data file and like to detect the type of variable in each column,
eg if it is an Intenger or a Float etc.
It works perfectly fine, however, at the moment it is still very basic and I like to add another idea.
So far the declaration of the variable is based on the second row of the data set.
(The first one is used as the header.)
Here is the beginning of the code:

#!/usr/bin/perl

use warnings;
use diagnostics;
use Getopt::Std;

getopts("i:s:t:") or die "bad options: $!";

if($opt_i) {
open INFILE, "< $opt_i";
chomp($headerline = <INFILE>);
$second = <INFILE>;
} else {
die "the input file has to be given\n";
}

if($opt_t) {
$tablename = $opt_t;
} else {
$tablename = $opt_i;
$tablename =~ s/\.\w+//;
}

if($opt_s) {
$sep = $opt_s;
} else {
$sep = ",";
}

$headerline =~ s/\"//g;
$headerline =~ s/\./\_/g;
@header = split/$sep/, $headerline;

$second =~ s/\"//g ;
@second = split/$sep/, $second;
@terms = split/$sep/, $second;
@types = split/$sep/, $second;

And now I have implemented a small loop.
The problem is that I don’t know how to handle the missing values which are declared with NULL. At the moment the loop simply assigns “” i.e. nothing to the variable $vartype[$j].

$j = 0;
while($j <= $#second) {
if ($types[$j] =~ /NULL/) {
$vartype[$j] = "";
} elsif($types[$j] =~ /[A-Za-z]/) {
$vartype[$j] = "varchar";
} elsif ($types[$j] =~ /\./) {
$vartype[$j] = "double";
} else {
$vartype[$j] = "int";
}
$j++;
}

So how can I implement another loop structure into the existing loop so that whenever I have a NULL value in one column the loop reads the next value in that same column and does so until he finds a number or a word.

A sample of my data would be eg:

Country.Name        Time.Name  AG.LND.AGRI.ZS   NY.GDP.MKTP.CD   NE.IMP.GNFS.ZS
Brunei Darussalam   1960       NULL             1139121335.16    3.46
Brunei Darussalam   1960       NULL             1677595756.64    0.9
Brunei Darussalam   1960       NULL             1488339328.59    4.19
Brunei Darussalam   1961       3.98             1869828587.8     3.14
Brunei Darussalam   1961       3.98             2346769422.22    3.38
Brunei Darussalam   1961       3.98             2363109706.3     3.17

As already mentioned the for loop only uses the second row to decide on the type of the variable.

Now I would like to implement another loop so that eg in the third column (AG.LND.AGRI.ZS) he goes through the column until he detects the first real value, in this case 3.98. At the moment the loop recognizes the missing value marked with NULL and just assigns an empty value.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T15:02:38+00:00Added an answer on May 13, 2026 at 3:02 pm

    I am having a hard time figuring out what you are trying to do. Assuming you are trying to guess column types based on column contents, here is a way to do it. The important thing to do is not to set anything when the field is NULL, skip a field if you have already decided its type, and get out of the loop once all field types have been determined.

    #!/usr/bin/perl
    
    use strict; use warnings;
    use Scalar::Util qw(looks_like_number);
    
    my @names = split ' ', scalar <DATA>;
    my @types;
    
    while ( <DATA> ) {
        chomp;
        my @values = split / {2,}/;
    
        for my $i ( 0 .. $#values ) {
            next if defined $types[$i];
            my $val = $values[$i];
            next if $val eq 'NULL';
            if ( $val =~ /^[0-9]+\z/ ) {
                $types[$i] = 'int';
            }
            elsif ( $val =~ /^[0-9.]+\z/
                    and looks_like_number($val) ) {
                $types[$i] = 'double';
            }
            else {
                $types[$i] = 'varchar';
            }
        }
        last unless grep { not defined } @types;
    }
    
    print "$_\n" for @types;
    
    
    __DATA__
    Country.Name        Time.Name  AG.LND.AGRI.ZS   NY.GDP.MKTP.CD   NE.IMP.GNFS.ZS
    Brunei Darussalam   1960       NULL             1139121335.16    3.46
    Brunei Darussalam   1960       NULL             1677595756.64    0.9
    Brunei Darussalam   1960       NULL             1488339328.59    4.19
    Brunei Darussalam   1961       3.98             1869828587.8     3.14
    Brunei Darussalam   1961       3.98             2346769422.22    3.38
    Brunei Darussalam   1961       3.98             2363109706.3     3.17
    

    Output:

    varchar
    int
    double
    double
    double
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.