Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8232283
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T17:46:57+00:00 2026-06-07T17:46:57+00:00

Possible Duplicate: SQL Query JOIN with Table CREATE EXTERNAL TABLE IF NOT EXISTS TestingTable1

  • 0

Possible Duplicate:
SQL Query JOIN with Table

CREATE EXTERNAL TABLE IF NOT EXISTS TestingTable1   (This is the MAIN table through which comparisons need to be made)
(
BUYER_ID BIGINT,
ITEM_ID BIGINT,
CREATED_TIME STRING
)

And this is the data in the above first table

**BUYER_ID**  |  **ITEM_ID**     |      **CREATED_TIME**   
--------------+------------------+-------------------------
 1015826235      220003038067       *2001-11-03 19:40:21*
 1015826235      300003861266        2001-11-08 18:19:59
 1015826235      140002997245        2003-08-22 09:23:17
 1015826235     *210002448035*       2001-11-11 22:21:11

This is Second table in Hive- It also contains information about the items we are purchasing.

CREATE EXTERNAL TABLE IF NOT EXISTS TestingTable2
(
USER_ID BIGINT,
PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>
)

And this is the data in the above second table (TestingTable2)-

**USER_ID**    **PURCHASED_ITEM**
1015826235     [{"product_id":220003038067,"timestamps":"1004941621"},    {"product_id":300003861266,"timestamps":"1005268799"},    {"product_id":140002997245,"timestamps":"1061569397"},{"product_id":200002448035,"timestamps":"1005542471"}]

Compare TestingTable2 with TestingTable1 so that below scenario is fulfilled.

Find the PRODUCT_ID AND TIMESTAMPS from TestingTable2 WHICH IS NOT MATCHING WITH ITEM_ID AND CREATED_TIME from TestingTable1 CORRESPONDING TO BUYER_ID(USER_ID) after comparing from TestingTable1.

So If you look TestingTable2 data this(last) ITEM_ID 210002448035 from TestingTable1 is not matching with TestingTable2 PRODUCT_ID- 200002448035 data and similarly with timestamps. So I want to show the below result using the HiveQL query.

**BUYER_ID**  |  **ITEM_ID**     |      **CREATED_TIME**          |  **PRODUCT_ID**    |     **TIMESTAMPS** 
--------------+------------------+--------------------------------+------------------------+----------------------
1015826235          *210002448035*           2001-11-11 22:21:11            200002448035           1005542471 
1015826235       220003038067           *2001-11-03 19:40:21*           220003038067          1004941621

Can anyone help me with this. As I am new to HiveQL so having lot of Problem.

Updated:-

I have written this query, but it doesn’t working the way I wanted to.

select * from 
  (select * from 
     (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps 
      from testingtable2 LATERAL VIEW
      explode(purchased_item) exploded_table as prod_and_ts)
      prod_and_ts 
      LEFT OUTER JOIN testingtable1 
  ON ( prod_and_ts.user_id = testingtable1.buyer_id AND testingtable1.item_id =    prod_and_ts.product_id
     AND prod_and_ts.timestamps = UNIX_TIMESTAMP (testingtable1.created_time)
  )
  where testingtable1.buyer_id IS NULL) 
  set_a LEFT OUTER JOIN testingtable1 
  ON (set_a.user_id = testingtable1.buyer_id AND  
  ( set_a.product_id = testingtable1.item_id OR set_a.timestamps = UNIX_TIMESTAMP(testingtable1.created_time) )
 );

One More UPDATE

As per user1166147 comments. I wrote my query as per his query.
In hive I guess INNER JOIN are written by simply JOIN.

This is my below query.

select * from (select t2.buyer_id, t2.item_id, t2.created_time as created_time, subq.user_id, subq.product_id, subq.timestamps as timestamps 
from
(select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from testingtable2 lateral view explode(purchased_item) exploded_table as prod_and_ts) subq JOIN testingtable1 t2 on t2.buyer_id = subq.user_id 
AND subq.timestamps = unix_timestamp(t2.created_time)
WHERE (subq.product_id <> t2.item_id)
union all
select t2.buyer_id, t2.item_id as item_id, t2.created_time, subq.user_id, subq.product_id as product_id, subq.timestamps
from 
(select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from testingtable2 lateral view explode(purchased_item) exploded_table as prod_and_ts) subq JOIN testingtable1 t2 on t2.buyer_id = subq.user_id 
    and subq.product_id = t2.item_id 
    WHERE (subq.timestamps <> unix_timestamp(t2.created_time))) unionall;

And after running the above query, I am getting zero result back.

One Final UPDATE:-

My Bad, I didn’t have the accurate data in the tables so that is the reason I was not getting the result back. Yes it is working the actual above query.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T17:46:58+00:00Added an answer on June 7, 2026 at 5:46 pm

    EDIT – PART 1
    Okay – For some reason I am going to explain myself – so to start with I stumbled upon this question because of the SQL tag, and saw Hive, and started to not look and just skip it. BUT then I noticed it had been over a day and you had gotten no answers. I looked – I saw a SQL logic correction in the original query posted that I knew would be needed and would help, so I posted ONLY because no one had answered. I will try to address this last question – but after that I am keeping my advice to myself, as I may be giving bad advice. Good luck! I tried! And you seem to be getting answers now, so…

    In TSQL, I could solve this entire problem with the below single query:

    SELECT * 
    FROM SO_Table1HIVE A
    FULL OUTER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND (B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID)
    

    It would return everything, including your match buyer_id/user_id only. It won’t match a buyer_id/user_id row with no matches in either time or product in the other table, but it will return it as a separate row with NULLS in the other table’s fields. I would not match these any way – there is no accurate information provided to do it with as explained below.

    END EDIT PART 1

    If you can’t do FULL OUTER JOIN with OR in Hive, the simplest way to meet the original criteria is to UNION ALL 2 INNER JOINs. On one of the queries, in addition to joining the matching user_ids, join on the PRODUCT_ID AND in your WHERE look for TIMESTAMPS that don’t match CREATED_TIME. On the second query, in addition to joining the matching user_ids, join on the times AND in your WHERE look for products that don’t match.

    EDIT PART 2 – UPDATE FOR COMMENT QUESTION ADDITIONAL CRITERIA

    If I understand the last criteria it is any record in either table that has a matching user_id = buyer_id, but nothing else matches. The FULL OUTER JOIN with OR condition will return them, but there isn’t enough provided info for a way to relate the records to each other. We can easily identify them, but have no way to tie them back to each other. If you do so and you have more than one record without a match in either OR both tables, there are going to be multiple entries for each.

    Any query I wrote to try to tie them without more info (and probably with) would be a guess and inaccurate.

    For example, in the first table if there were these 2 (sample fake) records with nothing matching in the second except user_id:

    1015826235  420003038067    2011-11-03 19:40:21.000
    1015826235  720003038067    2004-11-03 19:40:21.000
    

    AND in table2 – these non matching:

    1015826235  {"product_id":520003038067,"timestamps":"10...
    1015826235  {"product_id":620003038067,"timestamps":"10...
    

    You can identify them, but if you match them without more criteria you get 4 instead of 2:

    1015826235  420003038067    2011-11-03 19:40:21.000 1015826235 520003038067
    1015826235  420003038067    2011-11-03 19:40:21.000 1015826235 620003038067
    1015826235  720003038067    2004-11-03 19:40:21.000 1015826235 520003038067
    1015826235  720003038067    2004-11-03 19:40:21.000 1015826235 620003038067
    

    My suggestion would be simply to identify them and show them, as below.

    BUYER_ID        ITEM_ID      CREATED_TIME           USER_ID PRODUCTID   timestamps  
    ----------------------------------------------------------------------
    NULL            NULL         NULL                   1015826235  520003038067    2009-11-11 22:21:11.000
    NULL            NULL         NULL                   1015826235  620003038067    2008-11-11 22:21:11.000
    1015826235      420003038067 2011-11-03 19:40:21.000    NULL    NULL    NULL    
    1015826235      720003038067 2004-11-03 19:40:21.000    NULL    NULL    NULL    
    

    END EDIT PART 2 – UPDATE FOR COMMENT QUESTION ADDITIONAL CRITERIA – PART 1

    I am working with TSQL, so I can’t test for you an exact query with your syntax, but the concepts of the joins are the same, and this will return what you want. I did take your query and attempt your syntax, modify as needed. I tested in TSQL. You may be able to take this and improve upon it with functionality in HiveQL. There are other ways to do this – but this is the most straightforward and this will translate to HiveQL.

    REMOVED, YOU GOT THIS PART AND IT IS INCLUDED LATER

    (Again modify syntax as needed)**

    SELECT *
    FROM (
        SELECT BUYER_ID,ITEM_ID,CREATED_TIME,PRODUCT_ID,TIMESTAMPS 
        FROM testingtable2 LATERAL VIEW
            explode(purchased_item) exploded_table as prod_and_ts)
            prod_and_ts 
        INNER JOIN table2 A  ON A.BUYER_ID = prod_and_ts.[USER_ID] AND prod_and_ts.timestamps = UNIX_TIMESTAMP (table2.created_time) 
        WHERE prod_and_ts.product_id <> A.ITEM_ID
        UNION ALL
        SELECT BUYER_ID,ITEM_ID,CREATED_TIME,PRODUCT_ID,TIMESTAMPS 
        FROM testingtable2 LATERAL VIEW
                explode(purchased_item) exploded_table as prod_and_ts)
                prod_and_ts 
        INNER JOIN table2 A  ON A.BUYER_ID = prod_and_ts.[USER_ID] AND prod_and_ts.product_id = A.ITEM_ID
        WHERE  prod_and_ts.timestamps <> UNIX_TIMESTAMP (table2.created_time) 
    ) X
    

    And here is my tested TSQL version with my table names for reference:

    SELECT * 
    FROM(
        SELECT *
        FROM SO_Table1HIVE A
        INNER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND B.t1time = A.Created_TIME 
        WHERE B.PRODUCTID <> A.ITEM_ID
        UNION ALL
        SELECT * 
        FROM SO_Table1HIVE A
        INNER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND B.PRODUCTID = A.ITEM_ID  
        WHERE B.t1time <> A.Created_TIME  
     ) X
    

    *EDIT PART 3 – UPDATE FOR COMMENT QUESTION ADDITIONAL CRITERIA -PART 2

    In TSQL the entire query (no unions) can be run using a FULL OUTER JOIN with an OR condition on the join

    SELECT * 
    FROM SO_Table1HIVE A
    FULL OUTER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND (B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID)
    

    If you can’t simply do the above, For the SQL logic for the new criteria – to grab those that don’t match from both tables and display them as NULL in the other table use RIGHT JOIN and LEFT JOIN.
    RIGHT JOIN will grab anything in the first table the matches the second and everything in the second, and LEFT does the opposite. Add the new queries to your UNION.

    TSQL EXAMPLE – MODIFY FOR HIVE

    SELECT * 
    FROM SO_Table1HIVE A
    RIGHT JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND (B.t1time = A.Created_TIME    OR B.PRODUCTID = A.ITEM_ID)
    WHERE A.BUYER_ID IS NULL 
    UNION ALL
    SELECT * 
    FROM SO_Table1HIVE A
    LEFT JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND (B.t1time = A.Created_TIME OR    B.PRODUCTID = A.ITEM_ID)
    WHERE B.[USER_ID] IS NULL
    

    Or, If you wanted to grab them and match them as duplicates add to UNION:

    TSQL

    SELECT * 
    FROM SO_Table1HIVE A
    JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] 
    WHERE B.t1time NOT IN(SELECT Created_TIME FROM SO_Table1HIVE)
    AND A.Created_TIME  NOT IN(SELECT t1time FROM SO_Table2HIVE) 
    AND B.PRODUCTID NOT IN(SELECT ITEM_ID FROM SO_Table1HIVE)
    AND A.ITEM_ID NOT IN(SELECT PRODUCTID FROM SO_Table2HIVE) 
    

    Again, Good luck!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Possible Duplicate: SQL Query JOIN with Table If this is the data in TestingTable1
Possible Duplicate: SQL query to find Missing sequence numbers I have a table which
Possible Duplicate: Writing an SQL query to SELECT item from the following table I
Possible Duplicate: SQL Comments on Create Table on SQL Server 2008 I just want
Possible Duplicate: How do I Create a Comma-Separated List using a SQL Query? I
im trying to create a sql query, that will detect (possible) duplicate customers in
Possible Duplicate: Is there a way to create a SQL Server function to "join"
Possible Duplicate: Is there a way to create a SQL Server function to "join"
Possible Duplicate: Parent Child table record - Building SQL query Here is my table
Possible Duplicate: SQL Server dynamic PIVOT query? I have temporary table with following structure:

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.