A version is a timestamp values written alongside each value .
By default, the timestamp is represented as the time on the RegionServer when the data was written, but different timestamp value could be specified when we put data into the cell.
As hbase built on top of hdfs. So it does not over write values during update, but rather stores different values by time it is stored. To enable this feature hbase creates a version on the cells being updated. The default maximum number of row versions is 1 and it was 3 earlier versions of hbase (Prior to HBase 0.96).
You can specify number of version to keep for any given column at the time of Hbase table creation or using Alter command.
Versioning also helps to maintain the history of table values.
Example:
Let's create a hbase table named student with two column families personal_details and marks_details.
hbase(main):001:0> create 'student_management:student','personal_details','marks_details'
0 row(s) in 1.3920 seconds
· Alter HBase Table to add version
hbase(main):041:0> alter 'student_management:student' , NAME => 'personal_details' , VERSIONS => 4
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.8990 seconds
· Now, Inserting first record into the hbase table declared with version 4.
hbase(main):042:0> put 'student_management:student','1','personal_details:name','bela'
0 row(s) in 0.1180 seconds
hbase(main):043:0> put 'student_management:student','1','personal_details:name','bini'
0 row(s) in 0.0050 seconds
· Now while checking the value of name field ,the latest updated value will be on the top.
*Note: In HBase, rows and column keys are expressed as bytes, the version is specified using a long integer. As HBase version dimension is stored in decreasing order, so that when reading data from a stored file, the most recent values are found first.
hbase(main):044:0> get 'student_management:student','1',{ COLUMN => 'personal_details:name', VERSIONS => 3}
COLUMN CELL
personal_details:name timestamp=1588774360746, value=bini
personal_details:name timestamp=1588774335979, value=bela
2 row(s) in 0.0030 seconds
· Again updating name field two times.
hbase(main):045:0> put 'student_management:student','1','personal_details:name','rita'
0 row(s) in 0.0040 seconds
hbase(main):046:0> put 'student_management:student','1','personal_details:name','riya'
0 row(s) in 0.1590 seconds
· Now while reading data all four history of values under name field will be appeared.
as version is declared with 4 in hbase table named student . So it can preserve upto 4 history records of each field.
hbase(main):048:0> get 'student_management:student','1',{ COLUMN => 'personal_details:name', VERSIONS => 4}
COLUMN CELL
personal_details:name timestamp=1588774548028, value=riya
personal_details:name timestamp=1588774535977, value=rita
personal_details:name timestamp=1588774360746, value=bini
personal_details:name timestamp=1588774335979, value=bela
4 row(s) in 0.3020 seconds
· So if we update name field again and then, while fetching record ,only the latest ones we could see and it hides the older ones.
hbase(main):050:0> put 'student_management:student','1','personal_details:name','sonali'
0 row(s) in 0.0310 seconds
hbase(main):051:0> get 'student_management:student','1',{ COLUMN => 'personal_details:name', VERSIONS => 4}
COLUMN CELL
personal_details:name timestamp=1588774696983, value=sonali
personal_details:name timestamp=1588774548028, value=riya
personal_details:name timestamp=1588774535977, value=rita
personal_details:name timestamp=1588774360746, value=bini
4 row(s) in 0.0070 seconds
Thus update and delete process happens over hdfs storage through Apache Hbase.
Happy learning!!
See you in next blog!
1 Comments
Informative...good
ReplyDeletePlease do not enter any spam link in the comment box