In comparison with the previous BD|CESGA platform these are the main improvements:
- Hadoop is now upgraded to Hadoop 3.
- Spark 2.4 is now the default version.
- HUE 4.
- HDFS Erasure coding: allows to reduce storage overhead over default 3x replication.
- Impala is now available as an alternative to Hive for interactive SQL queries.
- The HOME system has been migrated from GlusterFS to the new Netapp storage system, this has greatly improved the latency of the HOME filesystem.
- Improved reliability:
- The HDFS NameNode is now in HA configuration.
- The YARN ResourceManager is now in HA configuration.
- Improved security:
- SSL/TLS is now enabled for more secure communications.