
Hbase versioning

version is a timestamp values  written alongside each value . 

 By default, the timestamp  is represented as  the time on the RegionServer when the data was written, but  different timestamp value could be specified when we put data into the cell.

 As hbase built on top of  hdfs. So it does not over write values during update, but rather stores different values by time it is stored. To enable this  feature hbase creates a version on the cells being updated. The default maximum number of row versions   is 1 and it was 3 earlier versions of hbase (Prior to HBase 0.96).

 You can specify number of version to keep for any given column at the time of Hbase table creation or using Alter command.

 Versioning also helps to maintain the history of table values.


Let's  create a hbase table named student with two column families personal_details and marks_details.

 hbase(main):001:0> create 'student_management:student','personal_details','marks_details'                                             

0 row(s) in 1.3920 seconds     

· Alter HBase Table to add version 

 hbase(main):041:0> alter 'student_management:student' , NAME => 'personal_details' , VERSIONS => 4                                    

Updating all regions with the new schema...                                                                                           

1/1 regions updated.                                                                                                                  


0 row(s) in 1.8990 seconds   

 · Now,  Inserting first record into  the hbase  table declared with version 4.


hbase(main):042:0> put 'student_management:student','1','personal_details:name','bela'                                                

0 row(s) in 0.1180 seconds                                                                                                            

 · Updating the value of name column field.


                                                                                                                              hbase(main):043:0> put 'student_management:student','1','personal_details:name','bini'                                                

0 row(s) in 0.0050 seconds                                                                                                            



·  Now while  checking the value of name field ,the latest  updated value will be on the  top.


*Note: In HBase, rows and column keys are expressed as bytes, the version is specified using a long integer. As  HBase version dimension is stored in decreasing order, so that when reading data from a stored file, the most recent values are found first.

hbase(main):044:0> get 'student_management:student','1',{ COLUMN => 'personal_details:name', VERSIONS => 3}                           

COLUMN                             CELL                                                                                               

 personal_details:name             timestamp=1588774360746, value=bini                                                                

 personal_details:name             timestamp=1588774335979, value=bela                                                                

2 row(s) in 0.0030 seconds                                                


· Again updating name field two times.

hbase(main):045:0> put 'student_management:student','1','personal_details:name','rita'                                                

0 row(s) in 0.0040 seconds                                                                                                            


hbase(main):046:0> put 'student_management:student','1','personal_details:name','riya'                                                

0 row(s) in 0.1590 seconds    

 · Now while reading data all four  history of values under name  field will be appeared.

as version is declared with 4 in hbase table named student . So it can preserve upto 4 history records of each field.

hbase(main):048:0> get 'student_management:student','1',{ COLUMN => 'personal_details:name', VERSIONS => 4}                           

COLUMN                             CELL                                                                                               

 personal_details:name             timestamp=1588774548028, value=riya                                                                

 personal_details:name             timestamp=1588774535977, value=rita                                                                

 personal_details:name             timestamp=1588774360746, value=bini                                                                

 personal_details:name             timestamp=1588774335979, value=bela                                                                

4 row(s) in 0.3020 seconds                      


· So if we  update  name field again  and then, while fetching record ,only the latest ones we could see and it hides the older ones.


hbase(main):050:0> put 'student_management:student','1','personal_details:name','sonali'                                              

0 row(s) in 0.0310 seconds                                                                                                            


hbase(main):051:0> get 'student_management:student','1',{ COLUMN => 'personal_details:name', VERSIONS => 4}                           

COLUMN                             CELL                                                                                               

 personal_details:name             timestamp=1588774696983, value=sonali                                                              

 personal_details:name             timestamp=1588774548028, value=riya                                                                

 personal_details:name             timestamp=1588774535977, value=rita                                                                

 personal_details:name             timestamp=1588774360746, value=bini                                                                

4 row(s) in 0.0070 seconds                                             


 Thus update and delete  process happens over hdfs storage through Apache Hbase.

Happy learning!!

See you in next blog!

Post a Comment


Please do not enter any spam link in the comment box